Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restoring Cassandra from snapshot getting an Error, and discrepancy in the node count after restoring the data #4724

Open
rg2609 opened this issue Nov 11, 2024 · 0 comments

Comments

@rg2609
Copy link

rg2609 commented Nov 11, 2024

We are running JanusGraph along with two clusters of Cassandra and Elasticsearch in Docker.

version: "3"

services:
  janusgraph:
    image: local-janusgraph:latest
    container_name: jce-janusgraphdb
    environment:
      JANUS_PROPS_TEMPLATE: cql-es
      janusgraph.storage.hostname: jce-cassandra-1,jce-cassandra-2
      janusgraph.index.search.hostname: jce-elastic-1,jce-elastic-2
    ports:
      - "8182:8182"
    networks:
      - jce-network
    volumes:
      - janusgraph-data:/var/lib/janusgraph  # Mounts a volume to JanusGraph

  cassandra-1:
    image: cassandra:3
    container_name: jce-cassandra-1
    environment:
      CASSANDRA_SEEDS: "jce-cassandra-1,jce-cassandra-2"
      CASSANDRA_CLUSTER_NAME: "janusgraph-cluster"
    networks:
      - jce-network
    ports:
      - "9042:9042"
      - "9160:9160"
    volumes:
      - cassandra1-data:/var/lib/cassandra  # Mounts a volume to Cassandra

  cassandra-2:
    image: cassandra:3
    container_name: jce-cassandra-2
    environment:
      CASSANDRA_SEEDS: "jce-cassandra-1,jce-cassandra-2"
      CASSANDRA_CLUSTER_NAME: "janusgraph-cluster"
    networks:
      - jce-network
    volumes:
      - cassandra2-data:/var/lib/cassandra  # Mounts a volume to Cassandra

  elasticsearch-1:
    image: docker.elastic.co/elasticsearch/elasticsearch:6.6.0
    container_name: jce-elastic-1
    environment:
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
      - "network.host=0.0.0.0"
      - "discovery.zen.ping.unicast.hosts=jce-elastic-1,jce-elastic-2"
    ports:
      - "9200:9200"
    networks:
      - jce-network
    volumes:
      - esdata1:/usr/share/elasticsearch/data  # Mounts a volume to Elasticsearch

  elasticsearch-2:
    image: docker.elastic.co/elasticsearch/elasticsearch:6.6.0
    container_name: jce-elastic-2
    environment:
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
      - "network.host=0.0.0.0"
      - "discovery.zen.ping.unicast.hosts=jce-elastic-1,jce-elastic-2"
    networks:
      - jce-network
    volumes:
      - esdata2:/usr/share/elasticsearch/data  # Mounts a volume to Elasticsearch

networks:
  jce-network:

volumes:
  janusgraph-data:
  cassandra1-data:
  cassandra2-data:
  esdata1:
  esdata2:

Dockerfile as follow

FROM docker.io/janusgraph/janusgraph:latest

WORKDIR /opt/janusgraph
USER root

# Copy configuration files
COPY janusgraph-server.yaml /opt/janusgraph/conf/janusgraph-server.yaml
COPY empty-sample.groovy /opt/janusgraph/scripts/empty-sample.groovy
COPY janusgraph-keyspace-one.properties /opt/janusgraph/conf/janusgraph-keyspace-one.properties
COPY janusgraph-keyspace-two.properties /opt/janusgraph/conf/janusgraph-keyspace-two.properties

# Set ownership for the entire conf directory in one step
# RUN chown -R janusgraph:janusgraph /opt/janusgraph/conf

USER janusgraph

WORKDIR /opt/janusgraph

We attempted to obtain fresh graph data, took a snapshot of keyspace_one, and then dropped the keyspace by running the following command.

DROP Keyspace_two ;

We then recreated the keyspace by executing the following command.

CREATE KEYSPACE IF NOT EXISTS keyspace_one WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'};

We executed the following command to recreate the schema and copy/restore the data from a snapshot to the keyspace folder for keyspace_one. The command is as follows:

  1. Restore
root@379dc32f07c7:/var/lib/cassandra/data/keyspace_two# cd edgestore-926ec1209d9311efa9ba31a44f2d5d77 && cp ./snapshots/1731070358151/* . && cd -
/var/lib/cassandra/data/keyspace_two
root@379dc32f07c7:/var/lib/cassandra/data/keyspace_two# cd edgestore_lock_-92aa91a09d9311ef8a912791ce557ac0 && cp ./snapshots/1731070358151/* . && cd -
/var/lib/cassandra/data/keyspace_two
root@379dc32f07c7:/var/lib/cassandra/data/keyspace_two# cd graphindex-93097a809d9311efa9ba31a44f2d5d77/ && cp ./snapshots/1731070358151/* . && cd -
/var/lib/cassandra/data/keyspace_two
root@379dc32f07c7:/var/lib/cassandra/data/keyspace_two# cd graphindex_lock_-9343eb709d9311ef8a912791ce557ac0 && cp ./snapshots/1731070358151/* . && cd -
/var/lib/cassandra/data/keyspace_two
root@379dc32f07c7:/var/lib/cassandra/data/keyspace_two# cd janusgraph_ids-9213cfe09d9311ef8a912791ce557ac0 && cp ./snapshots/1731070358151/* . && cd -
/var/lib/cassandra/data/keyspace_two
root@379dc32f07c7:/var/lib/cassandra/data/keyspace_two# cd systemlog-93d52ef09d9311efa9ba31a44f2d5d77 && cp ./snapshots/1731070358151/* . && cd -
/var/lib/cassandra/data/keyspace_two
root@379dc32f07c7:/var/lib/cassandra/data/keyspace_two# cd system_properties-91c6c1509d9311efa9ba31a44f2d5d77/ && cp ./snapshots/1731070358151/* . && cd -
/var/lib/cassandra/data/keyspace_two
root@379dc32f07c7:/var/lib/cassandra/data/keyspace_two# cd system_properties_lock_-942cebe09d9311efa9ba31a44f2d5d77/ && cp ./snapshots/1731070358151/* . && cd -
/var/lib/cassandra/data/keyspace_two
root@379dc32f07c7:/var/lib/cassandra/data/keyspace_two# cd txlog-93989b209d9311efa9ba31a44f2d5d77 && cp ./snapshots/1731070358151/* . && cd -

  1. Schema creation
root@379dc32f07c7:/var/lib/cassandra/data/keyspace_two# cqlsh -f edgestore-926ec1209d9311efa9ba31a44f2d5d77/schema.cql >> null
root@379dc32f07c7:/var/lib/cassandra/data/keyspace_two# cqlsh -f edgestore_lock_-92aa91a09d9311ef8a912791ce557ac0/schema.cql >> null
root@379dc32f07c7:/var/lib/cassandra/data/keyspace_two# cqlsh -f graphindex-93097a809d9311efa9ba31a44f2d5d77/schema.cql >> null
root@379dc32f07c7:/var/lib/cassandra/data/keyspace_two# cqlsh -f graphindex_lock_-9343eb709d9311ef8a912791ce557ac0/schema.cql >> null
root@379dc32f07c7:/var/lib/cassandra/data/keyspace_two# cqlsh -f janusgraph_ids-9213cfe09d9311ef8a912791ce557ac0/schema.cql >> null
root@379dc32f07c7:/var/lib/cassandra/data/keyspace_two# cqlsh -f systemlog-93d52ef09d9311efa9ba31a44f2d5d77/schema.cql >> null
root@379dc32f07c7:/var/lib/cassandra/data/keyspace_two# cqlsh -f system_properties-91c6c1509d9311efa9ba31a44f2d5d77/schema.cql >> null
root@379dc32f07c7:/var/lib/cassandra/data/keyspace_two# cqlsh -f system_properties_lock_-942cebe09d9311efa9ba31a44f2d5d77/schema.cql >> null
root@379dc32f07c7:/var/lib/cassandra/data/keyspace_two# cqlsh -f txlog-93989b209d9311efa9ba31a44f2d5d77/schema.cql >> null

  1. Nodetool refresh
root@379dc32f07c7:/var/lib/cassandra/data/keyspace_two# nodetool refresh -- keyspace_two edgestore
root@379dc32f07c7:/var/lib/cassandra/data/keyspace_two# nodetool refresh -- keyspace_two edgestore_lock_
root@379dc32f07c7:/var/lib/cassandra/data/keyspace_two# nodetool refresh -- keyspace_two graphindex
root@379dc32f07c7:/var/lib/cassandra/data/keyspace_two# nodetool refresh -- keyspace_two graphindex_lock_
root@379dc32f07c7:/var/lib/cassandra/data/keyspace_two# nodetool refresh -- keyspace_two janusgraph_ids
root@379dc32f07c7:/var/lib/cassandra/data/keyspace_two# nodetool refresh -- keyspace_two systemlog
root@379dc32f07c7:/var/lib/cassandra/data/keyspace_two# nodetool refresh -- keyspace_two system_properties
root@379dc32f07c7:/var/lib/cassandra/data/keyspace_two# nodetool refresh -- keyspace_two system_properties_lock_
root@379dc32f07c7:/var/lib/cassandra/data/keyspace_two# nodetool refresh -- keyspace_two txlog

After checking the node count of the newly restored data, it is not the same as the node count before the snapshot.

Node count for before snapshot

gremlin> g1.V().count();
==>3354

Node count after restore snapshot

gremlin> g1.V().count();
==>1592

We attempted to fetch some data from the restored keyspace, but we encountered the following error:

gremlin> g1.V().hasLabel("People").values("title");
Could not find type for id: 60941
Type ':help' or ':h' for help.
Display stack trace? [yN]y
java.lang.NullPointerException: Could not find type for id: 60941
	at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:994)
	at org.janusgraph.graphdb.types.vertices.JanusGraphSchemaVertex.name(JanusGraphSchemaVertex.java:73)
	at org.janusgraph.graphdb.vertices.AbstractVertex.label(AbstractVertex.java:122)
	at org.janusgraph.graphdb.types.system.ImplicitKey.computeProperty(ImplicitKey.java:94)
	at org.janusgraph.graphdb.query.vertex.BasicVertexCentricQueryBuilder.executeImplicitKeyQuery(BasicVertexCentricQueryBuilder.java:236)
	at org.janusgraph.graphdb.query.vertex.VertexCentricQueryBuilder.properties(VertexCentricQueryBuilder.java:119)
	at org.janusgraph.graphdb.util.ElementHelper.getValues(ElementHelper.java:48)
	at org.janusgraph.graphdb.query.condition.PredicateCondition.evaluate(PredicateCondition.java:72)
	at org.janusgraph.graphdb.query.condition.And.evaluate(And.java:55)
	at org.janusgraph.graphdb.query.graph.GraphCentricQuery.matches(GraphCentricQuery.java:157)
	at org.janusgraph.graphdb.query.QueryProcessor.lambda$getFilterIterator$2(QueryProcessor.java:138)
	at org.janusgraph.graphdb.util.CloseableIteratorUtils$1.computeNext(CloseableIteratorUtils.java:51)
	at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:145)
	at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:140)
	at org.janusgraph.graphdb.query.ResultSetIterator.nextInternal(ResultSetIterator.java:55)
	at org.janusgraph.graphdb.query.ResultSetIterator.<init>(ResultSetIterator.java:45)
	at org.janusgraph.graphdb.query.QueryProcessor.iterator(QueryProcessor.java:68)
	at org.janusgraph.graphdb.query.graph.GraphCentricQueryBuilder.lambda$iterables$1(GraphCentricQueryBuilder.java:240)
	at org.janusgraph.graphdb.tinkerpop.optimize.step.JanusGraphStep.lambda$executeGraphCentricQuery$2(JanusGraphStep.java:203)
	at org.janusgraph.graphdb.util.ProfiledIterator.<init>(ProfiledIterator.java:36)
	at org.janusgraph.graphdb.tinkerpop.optimize.step.JanusGraphStep.executeGraphCentricQuery(JanusGraphStep.java:203)
	at org.janusgraph.graphdb.tinkerpop.optimize.step.JanusGraphStep.lambda$null$0(JanusGraphStep.java:106)
	at java.base/java.lang.Iterable.forEach(Unknown Source)
	at org.janusgraph.graphdb.tinkerpop.optimize.step.JanusGraphStep.lambda$new$1(JanusGraphStep.java:106)
	at org.apache.tinkerpop.gremlin.process.traversal.step.map.GraphStep.processNextStart(GraphStep.java:158)
	at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.hasNext(AbstractStep.java:155)
	at org.apache.tinkerpop.gremlin.process.traversal.step.util.ExpandableStepIterator.next(ExpandableStepIterator.java:55)
	at org.janusgraph.graphdb.tinkerpop.optimize.step.JanusGraphMultiQueryStep.processNextStart(JanusGraphMultiQueryStep.java:111)
	at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.hasNext(AbstractStep.java:155)
	at org.apache.tinkerpop.gremlin.process.traversal.step.util.ExpandableStepIterator.hasNext(ExpandableStepIterator.java:47)
	at org.apache.tinkerpop.gremlin.process.traversal.step.map.NoOpBarrierStep.processAllStarts(NoOpBarrierStep.java:67)
	at org.apache.tinkerpop.gremlin.process.traversal.step.map.NoOpBarrierStep.processNextStart(NoOpBarrierStep.java:56)
	at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.hasNext(AbstractStep.java:155)
	at org.apache.tinkerpop.gremlin.process.traversal.step.util.ExpandableStepIterator.next(ExpandableStepIterator.java:55)
	at org.apache.tinkerpop.gremlin.process.traversal.step.map.FlatMapStep.processNextStart(FlatMapStep.java:48)
	at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.hasNext(AbstractStep.java:155)
	at org.apache.tinkerpop.gremlin.process.traversal.util.DefaultTraversal.hasNext(DefaultTraversal.java:192)
	at org.apache.tinkerpop.gremlin.server.op.AbstractOpProcessor.handleIterator(AbstractOpProcessor.java:98)
	at org.apache.tinkerpop.gremlin.server.op.AbstractEvalOpProcessor.lambda$evalOpInternal$6(AbstractEvalOpProcessor.java:267)
	at org.apache.tinkerpop.gremlin.groovy.engine.GremlinExecutor.lambda$eval$0(GremlinExecutor.java:283)
	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at java.base/java.lang.Thread.run(Unknown Source)
gremlin> 

I suspect we are missing the backup and restore of index data in Elasticsearch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant