Skip to content

[Feature Request] - Filter Entries From File (Automatically)

HinkysHinkys SEOSpartans.com - Catchalls for SER - 30 Day Free Trial
edited March 2016 in GSA Platform Identifier
Would it be possible to integrate the "filter entries from a file present in another file", the same one from SER (Advanced -> Tools -> Filter Entries From File) so that it works similarly as a delete duplicates project (automatically check files(s) every X minutes).

For example, this would be useful to automatically remove all targets that are already present in the SER's Identified folder after a platform identification run OR to remove all targets that were identified as part of previous PI runs.

Comments

  • s4nt0ss4nt0s Houston, Texas
    I'll look into this to see what it would take to add it. We can't add it right away, but I'll see if its something we can add in an update in the near future. I feel it would be a good feature to have, thanks for the suggestion. 
  • royalmiceroyalmice WEBSITE: ---> https://asiavirtualsolutions.com | SKYPE:---> asiavirtualsolutions
    edited March 2016
    @Hinkys @s4nt0s

    Sounds like a good idea. the only problem is not many people use GSA SER  for platform identifier.. Me for instance use GSA Platform identifier and then just import the sl file into gsa ser global identified.

    My biggest problem is the project target files, GSA SER just keep adding to the list without checking if it exist already.
    I spend 2 weeks removing duplicated from 10 GSA ser installs, total duplicates was close to 1 trillion and took up round 400GB of space.

    So i am totally for any automation , "set and forget", to remove dups from link list and target urls, even if it use more resources.

    Hope @sven can consider
  • HinkysHinkys SEOSpartans.com - Catchalls for SER - 30 Day Free Trial
    edited March 2016
    @s4nt0s
    Great, looking forward to it! I'm trying to fully automate the list building process and this feature would be invaluable in making the process efficient.

    @royalmice
    I was talking about the standalone GSA Platform Identifier software (it was posted in GSA PI forum although that isn't apparent from the post itself)

    As far as duplicates go, it would take a lot of processing power to check each new site against the entire identified list. With GSA Platform Identifier it's not that bad, just set a project to dedup your identified / verified folders every so often. But yeah, those duplicates add up REALLY fast.
  • HI I have just done a dedup of my idintified list and wow there was a lot in there.
    i have a question
    I took one of gsa gsr folders say WikkaWiki and ran gsa pi
    it picked up wikka but also wordpress,pingback,general blogs but also loads of unrecognised?
    question 1 why are they all split up like that and should I run Gsa Pi on each of the folders ?
    also what happens to the unrecognised do they get deleted or should I delete them?
    sorry am a bit new to scrpeaing 


    Derek
  • Wow, such an auto-filter on import would indeed save a great deal of time. Two thumbs up for this feature request.
Sign In or Register to comment.