Skip to content

Import URLs possible improvement

When using the 'Import URLs (identify platform and sort in)' option, would it be possible to add a filter to this, and be able to set a max OBL limit for the URL to be saved. I'm thinking primarily of blog comments where I don't want to bother keeping pages with thousands of OBLs.

Comments

  • I got around this by running it through Soup (sometimes Selenium) to get all the Xpaths that are outbound links and then only saving the URL if OBL was below a certain threshold.  I would then use this sanitized list for identification.

    Are you familiar with python? I can share the code here with you if you like.

    Note though: After a while for comments, it will always end up being 1000s of OBL as scrapers pick up the URL and spam it six ways till Sunday, so the benefit of this is likely only initial and it is resource intensive if one does 20m+ links.





    Thanked by 1cherub
  • SvenSven www.GSA-Online.de
    added it for next update
    Thanked by 1cherub
Sign In or Register to comment.