-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(cozy-clisk):Implementing selective file dowload and fixing metadata dedupli… #930
base: master
Are you sure you want to change the base?
Conversation
2c89c24
to
ecb3e4e
Compare
…data deduplication
ecb3e4e
to
c61d290
Compare
|
||
module.exports = getFileIfExists | ||
|
||
async function getFileIfExists({ client, entry, options, folderPath, slug }) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
jsdoc please with //ts-check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we make a request to the cozy, then the method should be called fetchSomething (https://github.com/cozy/cozy-guidelines/#naming-of-functions)
Get should only be used to get something in memory
|
||
async function getFileIfExists({ client, entry, options, folderPath, slug }) { | ||
const fileIdAttributes = options.fileIdAttributes | ||
const sourceAccountIdentifier = options.sourceAccountIdentifier |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: const { fileIdAttributes, sourceAccountIdentifier } = options
} else { | ||
log.debug('Rolling back on detection by filename') | ||
return getFileFromPath(client, entry, folderPath) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We prefer the early return, it's a lot easier to read:
if(!isReadyForFileMedata){
return getFileFromPath
} else {
...
...
}
} | ||
} | ||
|
||
async function getFileFromMetaData( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here for naming convention. Please follow https://github.com/cozy/cozy-guidelines/#naming-of-functions
fileIdAttributes, | ||
sourceAccountIdentifier, | ||
slug | ||
) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
jsdoc please
|
||
const isReadyForFileMetadata = | ||
fileIdAttributes && slug && sourceAccountIdentifier | ||
if (isReadyForFileMetadata) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to me isReadyForFileMetadata
means that this method will be called several times in a short period of time. But I don't think this is the meaning here. It's more hasEnoughtMetadata
or something no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is to decide if enough Metadata are available.
I can change some naming, but this kind of naming was inherited from the existing saveFiles
Same for functions name that you point.
'cozyMetadata.sourceAccountIdentifier', | ||
'cozyMetadata.createdByApp' | ||
]) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1/ We should really avoid the queryAll() method. This method can return a lot of documents and can be pretty long to execute. Can't we restrict the returned data by querying only the last 12 months or else?
2/ I don't know if you need all the fields from this io.cozy.files
but you can reduce the size of the returned data by selecting them by adding : .select(['name', 'cozyMetadata.createdAt'])
3/ You queryAll() but after that, you only return the first result. You should use something like this instead:
client.query(Q('io.cozy.files').where...limitBy(1))
be careful by changing to client.query() since query() will return an object with data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the queryAll is here only to know if more than one document satisfies the request, to log a waring if more than on file does it.
My mistake in the first place.
The get the same feature, we should then use client.query with limitBy(2)
) { | ||
log.debug( | ||
`Checking existence of ${calculateFileKey(entry, fileIdAttributes)}` | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const fileKey = calculateFileKey(entry, fileIdAttributes)
and use fileKey later instead of calculating it 3 times
'metadata.fileIdAttributes', | ||
'trashed', | ||
'cozyMetadata.sourceAccountIdentifier', | ||
'cozyMetadata.createdByApp' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the order is important (https://docs.cozy.io/en/tutorials/data/advanced/#indexes-performances-and-design) I'll changed to:
[
'metadata.fileIdAttributes',
'cozyMetadata.sourceAccountIdentifier',
'cozyMetadata.createdByApp'
'trashed'
]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd also suggest to use a partial index for the trashed attribute:
Q('io.cozy.files')
.where({
metadata: {
fileIdAttributes: calculateFileKey(entry, fileIdAttributes)
},
cozyMetadata: {
sourceAccountIdentifier,
createdByApp: slug
}
})
.indexFields([
'metadata.fileIdAttributes',
'cozyMetadata.sourceAccountIdentifier',
'cozyMetadata.createdByApp'
])
.partialIndex({
trashed: false
})
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why that? I though the same at first (putting it in a partial index) but then I checked trashed
attribute and it seems that is always there. So I don't understand why to put it in a partial index. :'(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since it's always there, it will always be indexed, but we only need the trashed: false
docs. So the partial index will allow a lighter index with only non-trashed files
} | ||
} | ||
|
||
async function getFileFromPath(client, entry, folderPath) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
jsdoc 🙏
|
||
async function getFileFromPath(client, entry, folderPath) { | ||
try { | ||
log.debug(`Checking existence of ${getFilePath({ entry, folderPath })}`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
store the result of getFilePath and use it later 🙏
return files[0] | ||
} else { | ||
log.debug('File not found') | ||
return false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should return null. Not boolean
return result.data | ||
} catch (err) { | ||
log.debug(err.message) | ||
return false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same should return null
Y a un truc qui me dérange avec ça, c'est qu'on fait les requêtes fichier par fichier. Et si on en a beaucoup, ça va vraiment pas être terrible. |
Et aussi, on est censé avoir un context sur les clisk des fichiers déjà récupérés etc, on devrait le passer à cet endroit là, non ? |
On avait enlevé la récupération du contexte lorsque les fichier n'avaient pas le bon createdByApp. Et on ne l'a pas remis depuis. Je pense qu'on peut faire une carte pour ça |
Avec un index ça n'est pas sensé être suffisamment performant ? Parce que sinon, on va de toutes façons devoir se reconstruire un inde x en mémoire pour savoir si un fichier existe déjà. En même temps, c'est vrai qu'ici on est sur un téléphone et le temps de réponse du réseau risque d'être plus élevé. Je ne sais pas vraiment dire à quel point. Mais je suppose que c'est une bonne chose qu'on soit plus facilement capables de faire des mesures maintenant |
@doubleface @LucsT what is the status of this PR? |
This PR bring a new exported function in cozy-clisk for flagship app.
This function is needed by the launcher to evaluate early if the file can be found on the cozy.
This code will be use by this PR on flagship app cozy/cozy-flagship-app#670
-> Separation and adaptation of function getFileIfExist in the lib
-> Evaluation of some parameters sooner in the process (folderpath, sourceAccountIdentifier, ...)
-> Adding some units tests that cover the function getFileIfExist.
-> Fixing deduplication on metadata that was not operational before due to received data structure
Need to be done before merge
-> Adapting timing wraper to this new lib function.
Should be done soon
-> Moving 'shouldreplace' sooner in the process
-> We lost the 'free' metadata update when file is already present, as we now do not update each time the file. Need to be adapt. (ex: add qualification on an existing already download file, ...)
-> At minimum, write basic saveFiles tests.