Skip to content

Does/can/how gsa grab all commentable urls from a blog comment based platform?

If I have one, or more, urls on a blog comment based platform, and/or just the root domain, that allows comment links to be dropped...does gsa already have this function, or could it be implemented, that it will grab all the other internal urls on that domain that also are open to comments?

Or do i have to check all the subbmitted/verified urls and sort by the blog comment related engines, and then do a site: or link extractor function in scrapebox, or whatever, to grab them all myself?

And before the Karen newbies chime in, im not worried about too many links or diminishing returns from the same domain blah blah blah.

Thanks

Comments

  • cherubcherub SERnuke.com
    For blog comments, once I have a decent sized verified list, I extract all the domains and then scrape site:domain.com for each of them, adding in some stopwords to try and get around the limitations Google gives on those sorts of searches. This usually gives me a load of other blog comment urls from those domains.
    Thanked by 1googlealchemist
  • SvenSven www.GSA-Online.de
    There is a script command for doing this per engine, but got blog comments it is a general engine that would not work for specific ones. I need to look into this then.
  • googlealchemistgooglealchemist Anywhere I want
    cherub said:
    For blog comments, once I have a decent sized verified list, I extract all the domains and then scrape site:domain.com for each of them, adding in some stopwords to try and get around the limitations Google gives on those sorts of searches. This usually gives me a load of other blog comment urls from those domains.
    thanks, thats what i figured my plan would be as well.
    what do u mean about the stop words to get around google limitations?
  • googlealchemistgooglealchemist Anywhere I want
    Sven said:
    There is a script command for doing this per engine, but got blog comments it is a general engine that would not work for specific ones. I need to look into this then.
    thatd be awesome if u could automate this in some way...

    not sure what u mean about the specific vs general engines though, i see about 20 different specific types of blog comments under the blog comment section?
  • cherubcherub SERnuke.com
    cherub said:
    For blog comments, once I have a decent sized verified list, I extract all the domains and then scrape site:domain.com for each of them, adding in some stopwords to try and get around the limitations Google gives on those sorts of searches. This usually gives me a load of other blog comment urls from those domains.
    thanks, thats what i figured my plan would be as well.
    what do u mean about the stop words to get around google limitations?
    Google will only display around 300-400 results max for any search, no matter if they say they have thousands of results available. Adding simple keywords or stopwords, negative keywords etc will usually bring back a slightly different set of results with possibly new urls not previously given.
    Thanked by 1googlealchemist
  • edited July 18
    I do same thing when scraping to "parse deeper".  You can do this buy setting add stop words 15 percent of the time in SER options for example. Just don't use the example list as is. This list has many languages and is made for EVERYONE and the footprints likely will get blocked quicker using odd characters in Google.com for example where polish or arabic or other patterns wouldnt be searched by human really. Get a small set of very simple words in your own language that make sense for search engine you are scraping. 



    This is how you can do it in SER.

    It does work and bring back results you would otherwise not have gotten.

    You can use this list for English, this is not the one I use but older example I had lying around. . .

    https://pastebin.com/34SvzRk9
  • googlealchemistgooglealchemist Anywhere I want
    edited July 25
    cherub said:
    cherub said:
    For blog comments, once I have a decent sized verified list, I extract all the domains and then scrape site:domain.com for each of them, adding in some stopwords to try and get around the limitations Google gives on those sorts of searches. This usually gives me a load of other blog comment urls from those domains.
    thanks, thats what i figured my plan would be as well.
    what do u mean about the stop words to get around google limitations?
    Google will only display around 300-400 results max for any search, no matter if they say they have thousands of results available. Adding simple keywords or stopwords, negative keywords etc will usually bring back a slightly different set of results with possibly new urls not previously given
    never realized they throttled site: searches to grab all the inner urls, as long as i had enough proxies to avoid the issue there, thanks for the tip!
Sign In or Register to comment.