Engine recognition

banel · January 2015

How is gsa ser able to identify websites? Based on what? Footprints?
Because I cleaned my list of wikis with the clean up option from tools. After that I used those websites (100% of them were wikis, fully working) and I was getting a lot of no engine matches... How is that possible?

Sven · January 2015

have a look in the scripts itself. there you find "page must have?" ... thats how sits are identified.

banel · January 2015

@sven : I just checked the files.
Then, why I got so many "no engine matches" ? I cleaned the list and then I imported the list in my project.. What could be the problem?

Sven · January 2015

not working proxies maybe.

banel · January 2015

@sven : when i cleaned those wikis I remember I set to retry download 2-3 times. After it finished I saved the unknown sites and used import and sort, again with the option to retry download 2-3 times, and got more wikis. I repeated this 2 times. Since gsa don't have this option I guess it tries to download and identify a website only once, right? Could this be another reason?

OldFusser · January 2015

@banel - I've noticed a lot of times sites that remove spam links will throw up a 404 error or have their own custom 404 served page which then does not have the "page must have" variables.

Or like Sven said. Proxy along with the actual site errors can be a issue. There's a lot of sites that get suspended for bandwidth, nonpayment, non-compliance, etc etc. Or they just have shitty hosts and go down or take too long to load alot of times.

Engine recognition

Comments