You might see that the Dropbox Community team have been busy working on some major updates to the Community itself! So, here is some info on what’s changed, what’s staying the same and what you can expect from the Dropbox Community overall.

Forum Discussion

iconify's avatar
iconify
Explorer | Level 4
5 years ago

Help Understanding how /files/list_folder/continue

I am working on an AWS serverless app that queries a specific DropBox folder tree for daily PDF uploads. My process and config/code are below. I _think_ I understand how the API endpoint is supposed to work but the results I am seeing do not match what I expect. So the most likely explanation is that I actually do not understand how it works.

 

My App:

 

My app is simple. I watch a DropBox folder for daily PDF uploads and at the end of the day, download and merge all new PDFs into a single PDF. I am using the NodeJS DropBox pkg here : https://www.npmjs.com/package/dropbox-v2-api

 

I have no indication that the NodeJS package is not working as it should.

 

On a given day there are between 150-200 PDFs anywhere from a couple of MB up to 500MB. I'm not having any issues with the size of the PDFs. That part works great.

 

The Process:

 

  1. At 2:00 AM every morning I call the get_latest_cursor endpoint and store the cursor.
  2. At 3:00 PM every afternoon I call /files/list_folder/continue passing the stored cursor
  3. My config has:
    1. recursive = true
    2. include_deleted = false
    3. limit = 2000

What I expect to see is a list of all files added to the folder tree each day since the 2:00 AM cursor excluding files with the ".tag" : "deleted" property.

 

What I am seeing is that ".tag" : "deleted" files are included in the results. So where as my result set should be around 400 files including support JPG and PSD files as well as the PDFs, I am seeing about 900 files because all of the deleted files are included even though I am explicitly excluding them.

 

 

/**
 * Get latest Dropbox cursor
 * @param event
 * @param callback
 * @returns {*}
 */
module.exports.getLatestCursor = (event, callback) => {

    const
        s3 = getAwsS3()
        , dropbox = getDropbox();

    console.log('[index.js][getLatestCursor] STEP 01 -- Get Latest Cursor')

    dropbox({
        resource: 'files/list_folder/get_latest_cursor',
        parameters: {
            path                           : process.env.DROPBOX_WATCH_FOLDER,
            recursive                      : true,
            include_deleted                : false,
            include_non_downloadable_files : false,
            include_media_info             : false,
            limit                          : 2000
        }
    }, (err, result, response) => {

        if (err) { return console.log(err); }

        console.log('[index.js][getLatestCursor] STEP 02 -- Prepare Latest Cursor', JSON.stringify(response))

        const params = {
            Bucket : process.env.S3_BUCKET_NAME,
            Key    : `cursor/${process.env.CURSOR_FILENAME}`,
            Body   : Buffer.from(JSON.stringify(response.body)),
            ACL    : 'private'
        };

        s3.upload(params, (err, data) => {
            console.log('[index.js][getLatestCursor] STEP 03 -- Save Latest Token to S3', data)
            if (err) throw err;
            callback(null, data)
        });
    });
};

Then, my call to list files:

 

 

 

/**
 * Get file list
 * @param event
 * @param callback
 * @returns {*}
 */
module.exports.getFileList = (event, callback) => {

    const
          bucket   = process.env.S3_BUCKET_NAME
        , prefix   = 'cursor'
        , filename = process.env.CURSOR_FILENAME

    const s3 = getAwsS3();

    const params = {
        Bucket: bucket,
        Key: `${prefix}/${filename}`
    }

    s3.getObject(params, (err, data) => {

        if (err) {
            console.error(err);
            throw err;
        }

        console.log('[index.js][getFileList] @@@ DATA @@@', data)

        let response;

        const
            dropbox  = getDropbox()
            , cursor = data.Body.cursor
            , params = {
                resource: 'files/list_folder/continue',
                parameters: {
                    cursor : cursor
                }
            };

        dropbox(params, (err, result, response) => {
            if (err) {
console.error(err);
throw err; }
console.log('[index.js][getFileList] @@@ ENTRIES @@@', result) let iter = 0 , _debug_downloads = [] , _debug_all_hr = [] , _debug_all_lr = [] , entries = [] , downloadables = [] if (result && typeof result.entries !== 'undefined') { entries = result.entries; saveToS3Bucket(`debug/${formatDate(false, true)}/${kTS}/entries.json`, JSON.stringify(entries)); entries = entries.map((entry, i) => { // Process the entries });

// Storing results for debugging. Ignore this. It works fine. saveToS3Bucket(`debug/${formatDate(false, true)}/${kTS}/proofs.csv`, _debug_all_lr.join("\r\n")); saveToS3Bucket(`debug/${formatDate(false, true)}/${kTS}/artwork.csv`, _debug_all_hr.join("\r\n")); saveToS3Bucket(`debug/${formatDate(false, true)}/${kTS}/downloadables.csv`, _debug_downloads.join("\r\n")); } // process result set. }); }); };

 

My questions are:

 

  1. Why are the deleted files being included? They should not be should they?
  2. Am I using the cursor and the list_folder/continue correctly?

Thanks in advance.

 

  • Thanks for the detailed writeup! 

     

    First, I should note that the 'dropbox-v2-api' package isn't made by Dropbox itself, so I can't really offer support for that or say what it may actually be doing under the hood, but I'll take a look here and advise with respect to the Dropbox API.

     

    Anyway, looking over your code and description, it looks like you have the right basic idea here for the most part (though it will depend on exactly what you're trying to accomplish of course), but there are a few things to note:

    • Regarding the deleted entries, note that the 'include_deleted' parameter only applies to "entries for files and folders that used to exist but were deleted", that is, at the time of the call to /2/files/list_folder/get_latest_cursor. Files or folders that are deleted after that call will still be reported later by /2/files/list_folder/continue as 'deleted'. Does this account for the entries you're seeing? Essentially, it may just be items deleted between 2:00 AM and 3:00 PM. If that doesn't seem to be it, perhaps you could share a sample so we can take a look? Feel free to open an API ticket privately if you'd prefer.
    • Also, I don't see you checking the 'has_more' value returned by /2/files/list_folder/continue. You're not guaranteed to get everything back in one call, so you should check that 'has_more' value and call back again to /2/files/list_folder/continue as described in the /2/files/list_folder documentation.
    • Also, it may or may not make sense for your use case, but you don't need to call /2/files/list_folder/get_latest_cursor every day. You can store and re-use the last cursor you received to be able to just receive updates about changes that have occurred since you received that cursor. That would let you track all changes over time. As written, it seems you're not monitoring anything that occurs between 3:00 PM and 2:00 AM. The Detecting Changes guide may be helpful, if you haven't already read it.
  • Greg-DB's avatar
    Greg-DB
    Icon for Dropbox Staff rankDropbox Staff

    Thanks for the detailed writeup! 

     

    First, I should note that the 'dropbox-v2-api' package isn't made by Dropbox itself, so I can't really offer support for that or say what it may actually be doing under the hood, but I'll take a look here and advise with respect to the Dropbox API.

     

    Anyway, looking over your code and description, it looks like you have the right basic idea here for the most part (though it will depend on exactly what you're trying to accomplish of course), but there are a few things to note:

    • Regarding the deleted entries, note that the 'include_deleted' parameter only applies to "entries for files and folders that used to exist but were deleted", that is, at the time of the call to /2/files/list_folder/get_latest_cursor. Files or folders that are deleted after that call will still be reported later by /2/files/list_folder/continue as 'deleted'. Does this account for the entries you're seeing? Essentially, it may just be items deleted between 2:00 AM and 3:00 PM. If that doesn't seem to be it, perhaps you could share a sample so we can take a look? Feel free to open an API ticket privately if you'd prefer.
    • Also, I don't see you checking the 'has_more' value returned by /2/files/list_folder/continue. You're not guaranteed to get everything back in one call, so you should check that 'has_more' value and call back again to /2/files/list_folder/continue as described in the /2/files/list_folder documentation.
    • Also, it may or may not make sense for your use case, but you don't need to call /2/files/list_folder/get_latest_cursor every day. You can store and re-use the last cursor you received to be able to just receive updates about changes that have occurred since you received that cursor. That would let you track all changes over time. As written, it seems you're not monitoring anything that occurs between 3:00 PM and 2:00 AM. The Detecting Changes guide may be helpful, if you haven't already read it.
    • iconify's avatar
      iconify
      Explorer | Level 4

      Ok, that clarifies the deleted file issue perfectly. I thought it meant to exclude all deleted so that makese sense.

       

      I do need to update the cursor every day because I only want the files for that day. If I don't update the cursor, won't that give me all files since the cursor? So that would give me all of the files going back possibly multiple days. That won't fit my use case.

       

      Yes, I caught the has_more issue last night. Technically that is the right thing to do but it won't have any tangible impact since I have the limit set to 2,000 files and we are not coming anywhere close to that. The most I have seen to-date is 900 in one day including the deleted files. I am deploying the update once I have tested it but did not include it here because I have not fully tested the new code.

       

      Thanks for the response. I think this resolves my issue. I mainly needed to confirm I'm understanding the way the API endpoint works and to clarify on the deleted files.

      • Greg-DB's avatar
        Greg-DB
        Icon for Dropbox Staff rankDropbox Staff

        Great, I'm glad that helps.

         

        To further clarify a few things though:

         

        "I only want the files for that day. If I don't update the cursor, won't that give me all files since the cursor? So that would give me all of the files going back possibly multiple days."

         

        One option is to always update your stored cursor to be the latest cursor you last received, e.g., from /2/files/list_folder/continue itself. When you then call /2/files/list_folder/continue again, you'll only receive updates that occurred since that call to /2/files/list_folder/continue that gave you that cursor. 

         

        "it won't have any tangible impact since I have the limit set to 2,000 files and we are not coming anywhere close to that. "

         

        Be aware that the "limit" is only an approximate upper bound on how many items Dropbox will return per page; it does not affect the lower bound. In some cases, Dropbox may have to return far fewer entries per page, in which case you do need to check and follow 'has_more'.