-
Notifications
You must be signed in to change notification settings - Fork 263
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Checkpoint file #42
Comments
can you share the checkpoint folder? Thank you |
@chrisrn Do you have the checkpoint folder already? |
Yes but it contains a more complex graph. But I can give you the code for converting a protobuf file into checkpoint. Inside a protobuf file all variables are converted to constants. So you can import the graph from protobuf, convert all constants to variables and export a checkpoint like that:
|
@chrisrn Thanks a lot. The code works well! |
Thanks for the convert function, but when I fine-tuned from the ckpt with pipeline.config of ssd_mobilenet_v1_coco, tensorflow reports that there is no weight of (may tensors) in the fine-tuned ckpt. So can you attach your pipline.config? |
@Dongshengjiang Have you got the pipeline.config file? |
Not yet
蒋
邮箱:[email protected]
签名由 网易邮箱大师 定制
On 01/17/2019 02:46, Yoel Molinas wrote: @Dongshengjiang Have you got the pipeline.config file?
—You are receiving this because you were mentioned.Reply to this email directly, view it on GitHub, or mute the thread.
{"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/yeephycho/tensorflow-face-detection","title":"yeephycho/tensorflow-face-detection","subtitle":"GitHub repository","main_image_url":"https://github.githubassets.com/images/email/message_cards/header.png","avatar_image_url":"https://github.githubassets.com/images/email/message_cards/avatar.png","action":{"name":"Open in GitHub","url":"https://github.com/yeephycho/tensorflow-face-detection"}},"updates":{"snippets":[{"icon":"PERSON","message":"@yoyomolinas in #42: @Dongshengjiang Have you got the pipeline.config file? "}],"action":{"name":"View Issue","url":"#42 (comment)"}}}
[
{
"@context": "http://schema.org",
"@type": "EmailMessage",
"potentialAction": {
"@type": "ViewAction",
"target": "#42 (comment)",
"url": "#42 (comment)",
"name": "View Issue"
},
"description": "View this Issue on GitHub",
"publisher": {
"@type": "Organization",
"name": "GitHub",
"url": "https://github.com"
}
}
]
|
@chrisrn Thanks for the conversion function. I realized that the conversion uses only a single graph to perform all loading and saving which causes new variables to have an extension of '_1' to their names. This causes several issues when attempting to load model from checkpoint files. I modified the function the following way to restore variables with the same names they were originally stored in the protobuf file. def protobuf_to_checkpoint_conversion(pb_model, ckpt_dir):
graph = tf.Graph()
with graph.as_default():
od_graph_def = tf.GraphDef()
with tf.gfile.GFile(pb_model, 'rb') as fid:
serialized_graph = fid.read()
od_graph_def.ParseFromString(serialized_graph)
tf.import_graph_def(od_graph_def,name='')
graph2 = tf.Graph()
with graph2.as_default():
config = tf.ConfigProto()
with tf.Session(graph=graph2, config=config) as sess:
constant_ops = [op for op in graph.get_operations() if op.type == "Const"]
params = []
for constant_op in constant_ops:
name = constant_op.name
shape = constant_op.outputs[0].get_shape()
var = tf.get_variable(name, shape=shape)
params.append(var)
init = tf.global_variables_initializer()
sess.run(init)
saver = tf.train.Saver(var_list=params)
ckpt_path = os.path.join(ckpt_dir, 'model.ckpt')
saver.save(sess, ckpt_path, global_step=1)
I am currently working on optimizing this face detector with TensorRT. I face some issues when exporting the model with object_detection.exporter.export_inference_graph from the object detection API. The error I specifically get when trying to export the frozen inference graph is this:
Inspection showed that this error is due to attempting to assign tensors with different shapes from the variables restored from the checkpoint to the pipeline.config generated model. I visualized the graphs on Tensorboard and realized that the BoxPredictor_x/ClassPredictor have output tensors with different shape in checkpoint and the config generated model. I suppose some special config parameters were used. I would appreciate if anyone can share their insights on the issue, or the config file. Thanks and best, |
SolutionFirst of all, the conversion function posted above is incomplete; variables are not loaded with trained parameters. Here is the updated version of the conversion function to load trained params into variables. def protobuf_to_checkpoint_conversion(pb_model, ckpt_dir):
graph = tf.Graph()
with graph.as_default():
od_graph_def = tf.GraphDef()
with tf.gfile.GFile(pb_model, 'rb') as fid:
serialized_graph = fid.read()
od_graph_def.ParseFromString(serialized_graph)
tf.import_graph_def(od_graph_def,name='')
image_tensor = graph.get_tensor_by_name('image_tensor:0')
dummy = np.random.random((1, 512, 512, 3))
with graph.as_default():
config = tf.ConfigProto()
with tf.Session(graph=graph, config=config) as sess:
constant_ops = [op for op in graph.get_operations() if op.type == "Const"]
vars_dict = {}
ass = []
for constant_op in constant_ops:
name = constant_op.name
const = constant_op.outputs[0]
shape = const.shape
var = tf.get_variable(name, shape, dtype=const.dtype, initializer=tf.zeros_initializer())
vars_dict[name] = var
print('INFO:Initializing variables')
init = tf.global_variables_initializer()
sess.run(init)
print('INFO: Loading vars')
for constant_op in tqdm(constant_ops):
name = constant_op.name
if 'FeatureExtractor' in name or 'BoxPredictor' in name:
const = constant_op.outputs[0]
shape = const.shape
var = vars_dict[name]
var.load(sess.run(const, feed_dict={image_tensor:dummy}), sess)
saver = tf.train.Saver(var_list=vars_dict)
ckpt_path = os.path.join(ckpt_dir, 'model.ckpt')
saver.save(sess, ckpt_path)
return graph, vars_dict If variables are not loaded, randomly initialized variables will be restored. Moreover, I solved the above issue by setting num_classes = 2 in pipeline.config file. For the object detection API this means that apart from the background class there are two more classes. This confuses me because the idea behind a binary object detector is that it has two classes, the object and the background class. Please provide some light into why num_classes is chosen to be 2 instead of 1. I have the ckpt and config file now, reach out if you need it. |
I need it very much,thank you! |
@yoyomolinas Thanks |
Here is the config file for all the people who requested. @hsulin0806 , @deimsdeutsch. |
EDIT: I'll leave this here in case anyone encouters the same problem. It was complaining about there not being a key named "global_step", so I manually inserted one
This is just @yoyomolinas' code where I also insert a new item in the dictionary vars_dict @yoyomolinas I can successfully generate the model.ckpt files using your code, however when using that checkpoint to run
it fails claiming
Is it something to do with how the .ckpt files are generated? The purpose of this would be to use the generated .pb file to convert into a tflite model Here is the complete error log:
Any help would be appreciated, thank you |
@fariagu I had the same issue too. What I did was that I went into the export.py and found the line that generates the error and commented out that line. Apparently tensorflow is trying to find and restore the global_step variable which does not exist in the checkpoint file generated. Of course, this is a temporary solution. If you find some better way to do this, let us know. Also, do you know what the global step variable does in a checkpoint file? |
@yoyomolinas from what I could gather the global-step variable is a sort of counter for when generating checkpoint files If you were to call
if would append '-0' to the file name, now becoming I can't say my solution is better but the code I pasted above when I edited my comment instanciates that same Thanks for replying 😄 |
@yoyomolinas As I read your comments you were trying to load this model in TensorRT. I'm trying the same thing right now. I've been able to generate .uff file but when I build the engine I get an error referring to the operation FILL which is not implemented in TensorRT engine. [TRT] UffParser: Validator error: FeatureExtractor/MobilenetV1/zeros_6: Unsupported operation _Fill I'm thinking 2 possibilities: to remove those operations because I don't really see why they are there or implement the FILL operation as a customPlugin in the TensorRT engine. Do you have any insight related to this? |
@sorny92 First of all before converting graph to uff, tensorflow object detection api has an exporter tool that prepares detection graphs for deployment. This process involves removing some unnecessary ops such as ASSERT ops and possibly the FILL op you described above. Check the link I provide below for an example. Converting models to uff have strict rules. For example, if one of the tf layers is not supported by the UffParser then you have to go about creating a custom plugin for TensorRT. Creating a custom layer is an arduous process. Instead I used Tensorflow's TensorRT package to optimize a tf graph in TensorRT. This package skips the TF layers not implemented in TensorRT during optimization. Although this solution is less optimal than using the converteduff model in TensorRT, I still achieved better performance than pure TF. If you are going about implementing custom plugins in TensorRT let me know, we can collaborate. |
@yoyomolinas Oh yes, I tried that but it seems I compiled from sources my Tensorflow build with a different version of TensorRT. I will give it a try soon! Thanks for your help, I will keep you informed if I get to implement it. |
@yoyomolinas I used your code for checkpoint conversion. Its working pretty well but I am not able to use the exported frozen graph model to tensorrt uff model that is runnable on jetson-inference. What might be the reason? |
Can you provide the checkpoint folder (including meta file)? It is common now in tensorflow to import meta graphs.
The text was updated successfully, but these errors were encountered: