Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] OTA Demo is constantly downloading and activating the same job #120

Open
fjuliofontes opened this issue Oct 21, 2024 · 10 comments
Open

Comments

@fjuliofontes
Copy link

OTA Demo seems to be always downloading the same job.

@AniruddhaKanhere
Copy link
Member

Hello @fjuliofontes, sorry to hear that you are having trouble with the example.

Can you please check whether the job is marked as complete on the IoT core? The demo is supposed to send a completion notification to the IoT core.

@fjuliofontes
Copy link
Author

Hello @AniruddhaKanhere ,
Thank you for replying. No, the job is with "PENDING". I can get you a full text file with logs.
By the way, I'm running the master version.

@ActoryOu
Copy link
Member

Hi @fjuliofontes,
While creating OTA Jobs, there are two Job run types. Which one are you using?

  • Your job will complete after deploying to the devices and groups that you chose (snapshot)
  • Your job will continue to deploy to any devices added to the groups that you chose (continuous)

Thank you.

@ActoryOu
Copy link
Member

Besides, could you elaborate your scenario a little bit?
Have you completed the OTA? Or is it failing at some point so the OTA job is not completed yet?

Thank you.

@fjuliofontes
Copy link
Author

Hi @ActoryOu ,
Thank you for the ongoing support.
I created a snapshot, but let me get back to you tomorrow. I will be again working on this and I will try to capture all the logs to help.

Thank you

@AniruddhaKanhere
Copy link
Member

AniruddhaKanhere commented Nov 4, 2024

@fjuliofontes would you please enable all logs? I want to see why doesn't the job send a notification to IoT core. If it completed successfully and if the device rebooted, then the firmware should send an update to the MQTT broker informing the cloud of completion.

You can enable MQTT logs here by defining all the following macros:

#define   CONFIG_CORE_MQTT_LOG_ERROR
#define   CONFIG_CORE_MQTT_LOG_WARN
#define   CONFIG_CORE_MQTT_LOG_INFO
#define   CONFIG_CORE_MQTT_LOG_DEBUG

@fjuliofontes
Copy link
Author

Hi @AniruddhaKanhere,

I collected the logs before seeing your previous message. Can you check if this is enough? As you can see it continues to download the same job. The only way to stop is by cancelling it on AWS.

ota_log.txt

Screenshot 2024-11-05 at 10 57 11 Screenshot 2024-11-05 at 10 58 34

@AniruddhaKanhere
Copy link
Member

@fjuliofontes, thank you for the logs! I am looking at them and will get back to you.

One question though - did you generate a new image and upload that to the console? Sometimes if I find a bug and fix it and run the OTA demo, I forget to also update the image in the cloud which is to be downloaded - that causes a bad image to be downloaded.

Thanks

@GillesHaverbeke
Copy link

GillesHaverbeke commented Nov 12, 2024

Hi! It seems I'm having the same issue. After completion & reboot, the PUBLISH packet to set the job to Succeeded is not accepted due to a "VersionMismatch":

I (4078) ota_over_mqtt_demo: OTA Completed successfully!

I (4158) coreMQTT: De-serialized incoming PUBLISH packet: DeserializerResult=MQTTSuccess.
I (4158) coreMQTT: State record updated. New state=MQTTPublishDone.
I (4168) ota_over_mqtt: Received update response: $aws/things/744DBDB372DC/jobs/AFR_OTA-DEV_241111_UPDATE_3/update/rejected{"timestamp":1731330895,"executionState":{"status":"IN_PROGRESS","statusDetails":{"self_test":"ready","updatedBy":"0x00000018"},"versionNumber":15},"code":"VersionMismatch","message":"Expected version 2 but found version 15"}0888,"versionNumber":15,"executionNumber":1,"jobDocument":{"afr_ota":{"protocols":["MQTT"],"streamname":"AFR_OTA-9fd67c0d-dddd-49d5-b1ca-8e757d2d975e","files":[{"filepath":"/","filesize":1275600,"fileid":0,"certfile":"/","sig-sha256-ecdsa":"MEEZCCBReQy8V97LtR+WqUFM85zasb6kWoOFZPe6rToYC0mEgIgIWs4pDhRxXnHjw1jyezCSa8y8c44+c6dMHaB62I7/Jg="}]}}}}.
I (4228) coreMQTT: Ack packet deserialized with result: MQTTSuccess.
I (4238) coreMQTT: State record updated. New state=MQTTPublishDone.
I (4238) mqtt_communication: Publish 3: SUCCESS

After this, the code is again picking up the same Job and keeps updating endlessly.

I think the cause is in this code:

        size_t messageBufferLength = Jobs_UpdateMsg( Succeeded,
                                                     "2",
                                                     1U,
                                                     messageBuffer,
                                                     UPDATE_JOB_MSG_LENGTH );

The hardcoded "2" is mismatching the version in AWS, which seems to be 15.

Maybe a fix is to describe the job document version before updating to retrieve the correct version?

@AniruddhaKanhere
Copy link
Member

@GillesHaverbeke, yes you are absolutely correct. If the job version reported is not correct, the IoT core will reject the update.
I will raise a PR soon as time permits to fix this issue! Thanks a lot for taking the time to help debug this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants