Skip to content

Engine recognition

How is gsa ser able to identify websites? Based on what? Footprints?
Because I cleaned my list of wikis with the clean up option from tools. After that I used those websites (100% of them were wikis, fully working) and I was getting a lot of no engine matches... How is that possible?

Comments

  • SvenSven www.GSA-Online.de
    have a look in the scripts itself. there you find "page must have?" ... thats how sits are identified.
  • @sven : I just checked the files.
    Then, why I got so many "no engine matches" ? I cleaned the list and then I imported the list in my project.. What could be the problem?
  • SvenSven www.GSA-Online.de
    not working proxies maybe.
  • @sven : when i cleaned those wikis I remember I set to retry download 2-3 times. After it finished I saved the unknown sites and used import and sort, again with the option to retry download 2-3 times, and got more wikis. I repeated this 2 times. Since gsa don't have this option I guess it tries to download and identify a website only once, right? Could this be another reason?
  • @banel - I've noticed a lot of times sites that remove spam links will throw up a 404 error or have their own custom 404 served page which then does not have the "page must have" variables.

    Or like Sven said. Proxy along with the actual site errors can be a issue. There's a lot of sites that get suspended for bandwidth, nonpayment, non-compliance, etc etc. Or they just have shitty hosts and go down or take too long to load alot of times.
Sign In or Register to comment.