DropBox "Conflicted Copy" Handling

0
Hi Sven, long-term power-user here, I was not sure which section to put this suggestion into, it would fit in either Platform identifier or SER, the solution could be applied to both.

I have on occasions up to 5 SER instances and 3 Platform Identifier instances running on multiple servers, I use DropBox to sync my files from server to server as they are processed. I know I am one of many that use this basic but highly effective set-up.

No matter how fast your servers, due to the latency of DropBox "Conflicted copies" are created at scale on files larger than 1mb (I guess dependant to your setup) and this can end up making a huge mess, as example, my identified folder has almost 6000 files due to this.

Request:

Platform Identifier Feature

Automated "hands-free" Dedupe and merge of conflicted files so: 

on noticing conflicted files:

Example------------------------------------------------
sitelist_Article-AltoCMS-LiveStreet (swedishpowerhouse's conflicted copy 2018-06-04 (1)).txt
sitelist_Article-AltoCMS-LiveStreet (swedishpowerhouse's conflicted copy 2018-06-04 (2)).txt
sitelist_Article-AltoCMS-LiveStreet (swedishpowerhouse's conflicted copy 2018-06-04 (3)).txt
sitelist_Article-AltoCMS-LiveStreet (swedishpowerhouse's conflicted copy 2018-06-04 (4)).txt
sitelist_Article-AltoCMS-LiveStreet (swedishpowerhouse's conflicted copy 2018-06-04).txt
sitelist_Article-AltoCMS-LiveStreet (swedishpowerhouse's conflicted copy 2018-06-08 (1)).txt
sitelist_Article-AltoCMS-LiveStreet (swedishpowerhouse's conflicted copy 2018-06-08 (2)).txt
sitelist_Article-AltoCMS-LiveStreet (swedishpowerhouse's conflicted copy 2018-06-08 (3)).txt
sitelist_Article-AltoCMS-LiveStreet (swedishpowerhouse's conflicted copy 2018-06-08 (5)).txt
sitelist_Article-AltoCMS-LiveStreet (swedishpowerhouse's conflicted copy 2018-06-08).txt
sitelist_Article-AltoCMS-LiveStreet.txt
/Example-----------------------------------------------

All documents are automatically deduped, merged to original file:

sitelist_Article-AltoCMS-LiveStreet.txt

and then finally the now null conflicted copies deleted.

Request:

SER Feature

It makes no difference to me which program handles this process but I wanted to point out how botmasters HREFER handles dedupe. In HREFER you have the option to deduped the parsed files on launch.

Perhaps if SER did the simple dedupe/merge/delete as described in the platform identifier request on boot, this would sort the issue and help keep everyone's GSA directories streamlined.

Right now I have quick and dirty solutions via PI, but I still have to pause all my servers once a week to sort this else it kills dropbox.

Keep up the great work, would be great if you can implement the above but in no way a deal breaker :)









Comments

  • 0
    SvenSven www.GSA-Online.de
    Accepted Answer
    hmm so you want it to auto merge the files in your example to one as *conflicted copy* -> to the original one?
  • 0
    Hi Sven, that's right, merge/dedupe all the conflicted copies to the original file and dedupe if possible. 




  • 0
    TheGypsyTheGypsy Madrid
    This was a problem way back when Ron and his team was trying to set up automated SER lists. They failed and went back to manual processing if I remember correctly but Sven may find some solution to this. 
  • 0
    It would be a game changer if this was addressed that is for sure :)
  • 0
    royalmiceroyalmice WEBSITE: ---> http://asiavirtualsolutions.com | SKYPE:---> asiavirtualsolutions
    This is not a GSA problem but rather a problem of writing to the same file at the same time from different apps, in your case 3 GSA PI's

    I used to have a problem like that when i was checking and sorting the list from my home pc with a 200 \ 100 mbs line

    But since i moved the GSA Platform identified to a dedicated VPS, i dont have anymore conflict issues, and my identified unique list is 100,286,611 urls or just over 2GB

     
    I did however noticed that sometimes GSA Platform identified is delaying saving the urls it sorted.So if you stop or restart GSA Platform Identifier make sure you wait at least  15 min before doing the dup removals and list clean up because GSA will wait as long as the set delay time to save  what it has identified and sorted, before it save. In my case i have set it to save every 600 sec, which is 10 min ( the max time u can set.



    Before you run the remove dup urls in GSA SER or GSA PI or custom tool, as in my case, make sure there are no Dropbox files being updated, when all dropbox files are synced then exit Dropbox, and then proceed to do the list clean up. When it is done, restart Dropbox and let it sync the updated list.


    On the rare occasion that you do get a couple of conflict, then simply open the main folder where your urls are saved, then search within the folder for the term  "conflict" and then select all and cut them, go paste them in a different folder, i just create a folder called conflict, and past all the conflict files in there.

    No use the GSA merge tool, and merge them all into one file, and when done, remove dup urls. Then run only those thru platform identifier again.

    It sounds like allot of work, but it takes less than 5 min to do.

     


    Thanked by 17ThDanWebNinja
  • 0
    Excellent info royalmice, thank you for sharing your process.
  • 0
    SvenSven www.GSA-Online.de
    Accepted Answer
    latest update will fix the conflict files in site lists every 10 minutes
  • 0
    Woah! Amazing, thanks so much!
  • 0
    Just to confirm, this is working perfectly, I should have done a benchmark a/b test but I can see my lpm is close to double what it was running the same campaigns with the now cleaned up lists.
  • 0
    royalmiceroyalmice WEBSITE: ---> http://asiavirtualsolutions.com | SKYPE:---> asiavirtualsolutions
    You are welcome @7ThDanWebNinja
    Thanks @Sven
Sign In or Register to comment.