Scrape and post to .edu and .gov lists

edited September 2012 in Need Help
Hi Guys,

How can I teach GSA to scrape and posts links to .edu or .gov blogs?

Please let me know the settings.

Thanks!

Comments

  • SvenSven www.GSA-Online.de
    Accepted Answer

    It's all written in the FAQ in several other threads. Put...

    !.gov !.edu

    ...in URL filter

  • Hi Sven,

    I applied that setting and ran it for 24 hours to see if that works but apparently it did not. 0 submit, 0 verified

    What should be my other setting to get it to post to .edu or .gov blogs?


  • So you've created a project that uses "General Blogs" engine and as Sven suggested you have put !.gov and !.edu in the "skip sites with the following words in url/domain" right?

    Well this setting will try to scrape and find blogs in general, and then post to only ones that have either .edu or .gov in their url.

    You didn't get any link, but what happens?
    When it searches does it find targets?
    Does it skip them for other filters that you have set (PR, badwords, keywords...)?
    Do you get "wrong engine", "download failed" or other errors? 
    You use proxies, captcha solvers...

    Give more informations so someone can help sort this out.

  • SvenSven www.GSA-Online.de
    Accepted Answer

    Make sure it is one line else one filer is excluding the other and nothing will match.

  • edited September 2012
    I have this problem, too. The negative filter doesn't work. I know it doesn't because when I have the filters enabled, GSA doesn't submit to my target URLs (I have a huge scraped list of over 11,000 that are mostly forums and blogs and moodle sites). My target URLs are not the problem because if I remove the negative filters, I am able to submit to 300+ sites!

    So either it is bugged or there's a new negative filter. By the way, I have zero restrictions about the OBL or the PR.

    See my post here - https://forum.gsa-online.de/discussion/431/is-the-negative-filter-bugged#Item_1
  • Now it is working!

    Thanks to Sven

    "Make sure it is one line else one filer is excluding the other and nothing will match." - This is the correct solution!

    Aside from that I lessen my SE to use only US engines. (I was using all "engines w/ english language" setting prior to the issue)

    Thank you again Sven

Sign In or Register to comment.