Skip to content

Any reason to test both an inner page and homepage in current version?

I dont know the original post I read it on, but it was something about how gsa processes urls when you import a new harvest of domains/urls to test to see if the software can get a link from it.

if i import domain.com/page123 it will try to look for an opportunity to place a link on that specific page, if it cant, it will navigate automatically to the root and look for an opportunity there before moving on to the next site?

if i just import the root directly, it will look for a link opportunity on the homepage...a registration/login/submit article type option or comment or guestbook form etc...but if there is no link opportunity directly there... then it will navigate to an inner page, based on the cms, to test for a link opportunity there?

and that someone had also mentioned i might get some additional submitted/verified urls if i took any inner pages i had uploaded but failed to get a link from, and stripped them down to the root and ran them thru again that way

am i remembering this correctly?

either way...with the current version of gsa which might make that a moot point regardless...whats the best practice for this?

i have a ton of urls that are inner pages and i have a ton of urls that are just the root domains from various scrapes.

is it worth the resources for me to just strip the whole list down to the roots, add that back to my master list, and then get rid of the duplicates just on  a url vs domain level?

or is it better now overall to just strip the whole list do to the root domains, and only upload that list...and not bother with any of the specific inner urls?

or maybe it doesnt matter in the slightest and i should just upload my mix of roots/inner urls that have been de dupplicated on a domain level

just trying to be as efficient as possible here when were talking about processing tens of millions of domains

thanks

Comments

  • SvenSven www.GSA-Online.de
    When importing URLs to the site list and try to detect what platform it is, it is not navigating to inner pages. That will only happen if you import the URL to your project and use the option Try to locate new URL on "no engine match".
    So it's always better to give it the URL where you are sure it's something it can submit to like a Blog-Posting where you can place a link as comment or a Guestbook page.
    Most engines can however be identified by any link you give it, like a wordpress or drupal site because it has footprints on the source (not visible all the time) all over it.
Sign In or Register to comment.