Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(cozy-clisk):Implementing selective file dowload and fixing metadata dedupli… #930

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions packages/cozy-clisk/src/index.js
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ export { MessengerInterface } from './bridge/bridgeInterfaces'
export { default as ContentScript } from './contentscript/ContentScript'
export { default as addData } from './launcher/addData'
export { default as hydrateAndFilter } from './launcher/hydrateAndFilter'
export { default as getFileIfExists } from './launcher/getFileIfExists'
export { default as saveFiles } from './launcher/saveFiles'
export { default as saveBills } from './launcher/saveBills'
export { default as saveIdentity } from './launcher/saveIdentity'
Expand Down
1 change: 1 addition & 0 deletions packages/cozy-clisk/src/launcher.js
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
export { default as addData } from './launcher/addData'
export { default as hydrateAndFilter } from './launcher/hydrateAndFilter'
export { default as getFileIfExists } from './launcher/getFileIfExists'
export { default as saveFiles } from './launcher/saveFiles'
export { default as saveBills } from './launcher/saveBills'
export { default as saveIdentity } from './launcher/saveIdentity'
Expand Down
128 changes: 128 additions & 0 deletions packages/cozy-clisk/src/launcher/getFileIfExists.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@
import Minilog from '@cozy/minilog'
import { Q } from 'cozy-client'
import get from 'lodash/get'

const log = Minilog('getFileIfExists')

module.exports = getFileIfExists

async function getFileIfExists({ client, entry, options, folderPath, slug }) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jsdoc please with //ts-check

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we make a request to the cozy, then the method should be called fetchSomething (https://github.com/cozy/cozy-guidelines/#naming-of-functions)

Get should only be used to get something in memory

const fileIdAttributes = options.fileIdAttributes
const sourceAccountIdentifier = options.sourceAccountIdentifier
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: const { fileIdAttributes, sourceAccountIdentifier } = options


const isReadyForFileMetadata =
fileIdAttributes && slug && sourceAccountIdentifier
if (isReadyForFileMetadata) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to me isReadyForFileMetadata means that this method will be called several times in a short period of time. But I don't think this is the meaning here. It's more hasEnoughtMetadata or something no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is to decide if enough Metadata are available.
I can change some naming, but this kind of naming was inherited from the existing saveFiles
Same for functions name that you point.

log.debug('Detecting file with metadata')
const file = await getFileFromMetaData(
client,
entry,
fileIdAttributes,
sourceAccountIdentifier,
slug
)
if (!file) {
// no file with correct metadata, maybe the corresponding file already exist in the default
// path from a previous version of the connector
log.debug('Rolling back on detection by filename')
return getFileFromPath(client, entry, folderPath)
} else {
return file
}
} else {
log.debug('Rolling back on detection by filename')
return getFileFromPath(client, entry, folderPath)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We prefer the early return, it's a lot easier to read:

if(!isReadyForFileMedata){
return getFileFromPath
} else {
... 
...
} 

}

async function getFileFromMetaData(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here for naming convention. Please follow https://github.com/cozy/cozy-guidelines/#naming-of-functions

client,
entry,
fileIdAttributes,
sourceAccountIdentifier,
slug
) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jsdoc please

log.debug(
`Checking existence of ${calculateFileKey(entry, fileIdAttributes)}`
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const fileKey = calculateFileKey(entry, fileIdAttributes)

and use fileKey later instead of calculating it 3 times

const files = await client.queryAll(
Q('io.cozy.files')
.where({
metadata: {
fileIdAttributes: calculateFileKey(entry, fileIdAttributes)
},
trashed: false,
cozyMetadata: {
sourceAccountIdentifier,
createdByApp: slug
}
})
.indexFields([
'metadata.fileIdAttributes',
'trashed',
'cozyMetadata.sourceAccountIdentifier',
'cozyMetadata.createdByApp'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the order is important (https://docs.cozy.io/en/tutorials/data/advanced/#indexes-performances-and-design) I'll changed to:

[
        'metadata.fileIdAttributes',
  'cozyMetadata.sourceAccountIdentifier',
        'cozyMetadata.createdByApp'
        'trashed'
      
]

Copy link
Contributor

@paultranvan paultranvan Mar 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd also suggest to use a partial index for the trashed attribute:

    Q('io.cozy.files')
      .where({
        metadata: {
          fileIdAttributes: calculateFileKey(entry, fileIdAttributes)
        },
        cozyMetadata: {
          sourceAccountIdentifier,
          createdByApp: slug
        }
      })
      .indexFields([
        'metadata.fileIdAttributes',
        'cozyMetadata.sourceAccountIdentifier',
        'cozyMetadata.createdByApp'
      ])
      .partialIndex({
        trashed: false
      })

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why that? I though the same at first (putting it in a partial index) but then I checked trashed attribute and it seems that is always there. So I don't understand why to put it in a partial index. :'(

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it's always there, it will always be indexed, but we only need the trashed: false docs. So the partial index will allow a lighter index with only non-trashed files

])
)
Copy link
Contributor

@Crash-- Crash-- Mar 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1/ We should really avoid the queryAll() method. This method can return a lot of documents and can be pretty long to execute. Can't we restrict the returned data by querying only the last 12 months or else?

2/ I don't know if you need all the fields from this io.cozy.files but you can reduce the size of the returned data by selecting them by adding : .select(['name', 'cozyMetadata.createdAt'])

3/ You queryAll() but after that, you only return the first result. You should use something like this instead:

client.query(Q('io.cozy.files').where...limitBy(1))

be careful by changing to client.query() since query() will return an object with data.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the queryAll is here only to know if more than one document satisfies the request, to log a waring if more than on file does it.

My mistake in the first place.

The get the same feature, we should then use client.query with limitBy(2)

if (files && files[0]) {
if (files.length > 1) {
log.warn(
`Found ${files.length} files corresponding to ${calculateFileKey(
entry,
fileIdAttributes
)}`
)
}
return files[0]
} else {
log.debug('File not found')
return false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should return null. Not boolean

}
}

async function getFileFromPath(client, entry, folderPath) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jsdoc 🙏

try {
log.debug(`Checking existence of ${getFilePath({ entry, folderPath })}`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

store the result of getFilePath and use it later 🙏

const result = await client
.collection('io.cozy.files')
.statByPath(getFilePath({ entry, folderPath }))
return result.data
} catch (err) {
log.debug(err.message)
return false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same should return null

}
}

function getFilePath({ file, entry, folderPath }) {
if (file) {
return folderPath + '/' + getAttribute(file, 'name')
} else if (entry) {
return folderPath + '/' + getFileName(entry)
}
}

function getAttribute(obj, attribute) {
return get(obj, `attributes.${attribute}`, get(obj, attribute))
}

function calculateFileKey(entry, fileIdAttributes) {
return fileIdAttributes
.sort()
.map(key => get(entry, key))
.join('####')
}

function getFileName(entry) {
let filename
if (entry.filename) {
filename = entry.filename
} else {
log.error('Could not get a file name for the entry')
return false
}
return sanitizeFileName(filename)
}

function sanitizeFileName(filename) {
return filename.replace(/^\.+$/, '').replace(/[/?<>\\:*|":]/g, '')
}
129 changes: 129 additions & 0 deletions packages/cozy-clisk/src/launcher/getFileIfExists.spec.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,129 @@
import getFileIfExists from './getFileIfExists'

describe('getFileIfExists', function () {
function setup() {
const statByPath = jest.fn()
const queryAll = jest.fn()
const client = {
queryAll,
collection: () => ({
statByPath
})
}
return {
statByPath,
queryAll,
client
}
}

it('should use metadata if fileIdAttributes, slug and sourceAccountIdentifier are present', async () => {
const { client, queryAll } = setup()
queryAll.mockResolvedValue([{ _id: 'abcd' }]) // Mocking the return with a simple file obj
const file = await getFileIfExists({
client,
entry: {
filename: 'filename'
},
options: {
fileIdAttributes: ['filename'],
sourceAccountIdentifier: 'Identifier'
},
folderPath: '/fakepath',
slug: 'slug'
})
// Test that the Q request cotains our parameters
expect(queryAll).toHaveBeenCalledWith(
expect.objectContaining({
selector: expect.objectContaining({
cozyMetadata: expect.objectContaining({
createdByApp: 'slug',
sourceAccountIdentifier: 'Identifier'
}),
metadata: expect.objectContaining({ fileIdAttributes: 'filename' })
})
})
)
expect(file).toStrictEqual({ _id: 'abcd' })
})

it('should roll back to filename when with metadata return nothing', async () => {
const { client, queryAll, statByPath } = setup()
queryAll.mockResolvedValue(undefined) // Mocking an empty return
statByPath.mockResolvedValue({ data: { _id: 'abcd' } })
const file = await getFileIfExists({
client,
entry: {
filename: 'filename'
},
options: {
fileIdAttributes: ['filename'],
sourceAccountIdentifier: 'Identifier'
},
folderPath: '/fakepath',
slug: 'slug'
})
expect(queryAll).toHaveBeenCalledWith(
expect.objectContaining({
selector: expect.objectContaining({
cozyMetadata: expect.objectContaining({
createdByApp: 'slug',
sourceAccountIdentifier: 'Identifier'
}),
metadata: expect.objectContaining({ fileIdAttributes: 'filename' })
})
})
)
expect(statByPath).toHaveBeenCalledWith('/fakepath/filename')
expect(file).toStrictEqual({ _id: 'abcd' })
})

it('shoud use filename if sourceAccountIdentifier is missing', async () => {
const { statByPath, client } = setup()
statByPath.mockResolvedValue({ data: { _id: 'abcd' } })
const file = await getFileIfExists({
client,
entry: {
filename: 'filename'
},
options: { fileIdAttributes: ['filename'] },
folderPath: '/fakepath',
slug: 'slug'
})
expect(statByPath).toHaveBeenCalledWith('/fakepath/filename')
expect(file).toStrictEqual({ _id: 'abcd' })
})

it('shoud use filename if slug is missing', async () => {
const { statByPath, client } = setup()
statByPath.mockResolvedValue({ data: { _id: 'abcd' } })
const file = await getFileIfExists({
client,
entry: {
filename: 'filename'
},
options: {
fileIdAttributes: ['filename'],
sourceAccountIdentifier: 'Identifier'
},
folderPath: '/fakepath'
})
expect(statByPath).toHaveBeenCalledWith('/fakepath/filename')
expect(file).toStrictEqual({ _id: 'abcd' })
})
it('shoud use filename if fileIdAttributes is missing', async () => {
const { statByPath, client } = setup()
statByPath.mockResolvedValue({ data: { _id: 'abcd' } })
const file = await getFileIfExists({
client,
entry: {
filename: 'filename'
},
options: { sourceAccountIdentifier: 'Identifier' },
folderPath: '/fakepath',
slug: 'slug'
})
expect(statByPath).toHaveBeenCalledWith('/fakepath/filename')
expect(file).toStrictEqual({ _id: 'abcd' })
})
})
Loading