Skip to content

"already parsed" improvement?

edited October 2012 in Feature Requests
I don't know if it already works like this, but:

When using some filtering option, like "at least 1 of my keywords must be present" if the first page of site/blog get's filtered because it does not contain any keyword, does the whole site show up as "already parsed"? 
If so, I think it should be corrected, I mean some filters are at the domain level and some are specific to the single page.
A typical exampe is scraping a list with another program and feeding it to GSA, you might have several pages from the same blog, if the first one doesn't contain any keyword all the urls would be skipped, instead some other posts could contain your keywords.


These are the filters I can think of:
- domain PR -> domain level (whole site is filtered and becomes "already parsed")
- page PR -> page level (you should filter this page but not whole domain)
- keyword -> page level
- OBL -> page level
- bad words filter -> page level
- bad words in domain/url -> domain level (?)

Thanks.

Comments

  • SvenSven www.GSA-Online.de
    I see what your point is here but this in deep filtering is to slow. It would require to take track of all kinds of information to filter sites out and keep there history. Sorry but for now I don't see a way to keep it fast and still record data like that to not filter anything out incorrectly.
  • If you think it would be a good thing maybe the easy way to do it, although not perfect, could be to just skip marking the site as "already parsed" if the filter is a page level one.
    This would mean that if I come across another page of the site/blog I would consider it for posting. It would also mean that if I come across the same page again I would reconsider it which would be a waste of time, but not that much maybe...you see what is best.
  • an option to disable "already parsed" would be great. It really doe impede my scraping style. Please Sven :)
  • SvenSven www.GSA-Online.de
    That option already exists and is called "Continuously try to post to a site even if failed before"
  • Oh shit, thanks @sven for clearing that up. That explains a lot.
  • Ohhh..... I wish I knew that. I'm glad I do now, cheers! :)
  • ronron SERLists.com
    edited December 2013
    The 'wow' is when you see your LPM and verified links increase just because you ticked that little box.
  • edited December 2013
    ^^ 
    LPM no that much but verified links \:D/
  • ronron SERLists.com
    edited December 2013
    :D

Sign In or Register to comment.