Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(cozy-clisk):Implementing selective file dowload and fixing metadata dedupli… #930

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

LucsT
Copy link
Contributor

@LucsT LucsT commented Mar 17, 2023

This PR bring a new exported function in cozy-clisk for flagship app.
This function is needed by the launcher to evaluate early if the file can be found on the cozy.
This code will be use by this PR on flagship app cozy/cozy-flagship-app#670

-> Separation and adaptation of function getFileIfExist in the lib
-> Evaluation of some parameters sooner in the process (folderpath, sourceAccountIdentifier, ...)
-> Adding some units tests that cover the function getFileIfExist.
-> Fixing deduplication on metadata that was not operational before due to received data structure

Need to be done before merge
-> Adapting timing wraper to this new lib function.

Should be done soon
-> Moving 'shouldreplace' sooner in the process
-> We lost the 'free' metadata update when file is already present, as we now do not update each time the file. Need to be adapt. (ex: add qualification on an existing already download file, ...)
-> At minimum, write basic saveFiles tests.

@LucsT LucsT changed the title feat[cozy-clisk]:Implementing selective file dowload and fixing metadata dedupli… feat(cozy-clisk):Implementing selective file dowload and fixing metadata dedupli… Mar 17, 2023
@LucsT LucsT force-pushed the feat/SelectiveFileDownload branch 2 times, most recently from 2c89c24 to ecb3e4e Compare March 17, 2023 10:16
@LucsT LucsT force-pushed the feat/SelectiveFileDownload branch from ecb3e4e to c61d290 Compare March 17, 2023 10:22
@LucsT LucsT marked this pull request as ready for review March 20, 2023 15:26
@LucsT LucsT requested a review from a team as a code owner March 20, 2023 15:26

module.exports = getFileIfExists

async function getFileIfExists({ client, entry, options, folderPath, slug }) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jsdoc please with //ts-check

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we make a request to the cozy, then the method should be called fetchSomething (https://github.com/cozy/cozy-guidelines/#naming-of-functions)

Get should only be used to get something in memory


async function getFileIfExists({ client, entry, options, folderPath, slug }) {
const fileIdAttributes = options.fileIdAttributes
const sourceAccountIdentifier = options.sourceAccountIdentifier
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: const { fileIdAttributes, sourceAccountIdentifier } = options

} else {
log.debug('Rolling back on detection by filename')
return getFileFromPath(client, entry, folderPath)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We prefer the early return, it's a lot easier to read:

if(!isReadyForFileMedata){
return getFileFromPath
} else {
... 
...
} 

}
}

async function getFileFromMetaData(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here for naming convention. Please follow https://github.com/cozy/cozy-guidelines/#naming-of-functions

fileIdAttributes,
sourceAccountIdentifier,
slug
) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jsdoc please


const isReadyForFileMetadata =
fileIdAttributes && slug && sourceAccountIdentifier
if (isReadyForFileMetadata) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to me isReadyForFileMetadata means that this method will be called several times in a short period of time. But I don't think this is the meaning here. It's more hasEnoughtMetadata or something no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is to decide if enough Metadata are available.
I can change some naming, but this kind of naming was inherited from the existing saveFiles
Same for functions name that you point.

'cozyMetadata.sourceAccountIdentifier',
'cozyMetadata.createdByApp'
])
)
Copy link
Contributor

@Crash-- Crash-- Mar 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1/ We should really avoid the queryAll() method. This method can return a lot of documents and can be pretty long to execute. Can't we restrict the returned data by querying only the last 12 months or else?

2/ I don't know if you need all the fields from this io.cozy.files but you can reduce the size of the returned data by selecting them by adding : .select(['name', 'cozyMetadata.createdAt'])

3/ You queryAll() but after that, you only return the first result. You should use something like this instead:

client.query(Q('io.cozy.files').where...limitBy(1))

be careful by changing to client.query() since query() will return an object with data.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the queryAll is here only to know if more than one document satisfies the request, to log a waring if more than on file does it.

My mistake in the first place.

The get the same feature, we should then use client.query with limitBy(2)

) {
log.debug(
`Checking existence of ${calculateFileKey(entry, fileIdAttributes)}`
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const fileKey = calculateFileKey(entry, fileIdAttributes)

and use fileKey later instead of calculating it 3 times

'metadata.fileIdAttributes',
'trashed',
'cozyMetadata.sourceAccountIdentifier',
'cozyMetadata.createdByApp'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the order is important (https://docs.cozy.io/en/tutorials/data/advanced/#indexes-performances-and-design) I'll changed to:

[
        'metadata.fileIdAttributes',
  'cozyMetadata.sourceAccountIdentifier',
        'cozyMetadata.createdByApp'
        'trashed'
      
]

Copy link
Contributor

@paultranvan paultranvan Mar 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd also suggest to use a partial index for the trashed attribute:

    Q('io.cozy.files')
      .where({
        metadata: {
          fileIdAttributes: calculateFileKey(entry, fileIdAttributes)
        },
        cozyMetadata: {
          sourceAccountIdentifier,
          createdByApp: slug
        }
      })
      .indexFields([
        'metadata.fileIdAttributes',
        'cozyMetadata.sourceAccountIdentifier',
        'cozyMetadata.createdByApp'
      ])
      .partialIndex({
        trashed: false
      })

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why that? I though the same at first (putting it in a partial index) but then I checked trashed attribute and it seems that is always there. So I don't understand why to put it in a partial index. :'(

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it's always there, it will always be indexed, but we only need the trashed: false docs. So the partial index will allow a lighter index with only non-trashed files

}
}

async function getFileFromPath(client, entry, folderPath) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jsdoc 🙏


async function getFileFromPath(client, entry, folderPath) {
try {
log.debug(`Checking existence of ${getFilePath({ entry, folderPath })}`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

store the result of getFilePath and use it later 🙏

return files[0]
} else {
log.debug('File not found')
return false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should return null. Not boolean

return result.data
} catch (err) {
log.debug(err.message)
return false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same should return null

@Crash--
Copy link
Contributor

Crash-- commented Mar 21, 2023

Y a un truc qui me dérange avec ça, c'est qu'on fait les requêtes fichier par fichier. Et si on en a beaucoup, ça va vraiment pas être terrible.

@Crash--
Copy link
Contributor

Crash-- commented Mar 21, 2023

Et aussi, on est censé avoir un context sur les clisk des fichiers déjà récupérés etc, on devrait le passer à cet endroit là, non ?

@doubleface
Copy link
Collaborator

@Crash-- Et aussi, on est censé avoir un context sur les clisk des fichiers déjà récupérés etc, on devrait le passer à cet endroit là, non ?

On avait enlevé la récupération du contexte lorsque les fichier n'avaient pas le bon createdByApp. Et on ne l'a pas remis depuis. Je pense qu'on peut faire une carte pour ça

@doubleface
Copy link
Collaborator

@Crash-- Y a un truc qui me dérange avec ça, c'est qu'on fait les requêtes fichier par fichier. Et si on en a beaucoup, ça va vraiment pas être terrible.

Avec un index ça n'est pas sensé être suffisamment performant ? Parce que sinon, on va de toutes façons devoir se reconstruire un inde x en mémoire pour savoir si un fichier existe déjà.

En même temps, c'est vrai qu'ici on est sur un téléphone et le temps de réponse du réseau risque d'être plus élevé. Je ne sais pas vraiment dire à quel point.

Mais je suppose que c'est une bonne chose qu'on soit plus facilement capables de faire des mesures maintenant

@Crash--
Copy link
Contributor

Crash-- commented Jan 2, 2024

@doubleface @LucsT what is the status of this PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants