Content Generator: Scrape all Articles from a custom domain

Hey guys,
is that possible to scrape all Articles from a domain? I tried the search function, but i cant find a answer that can help me.

What i try to do:

i have a domain with 150 articles. I want to insert the domain.tld to scrape with the content generator all of these articles from all urls.

Can anyone help me with that problem?

Thanks a lot!


  • SvenSven
    Accepted Answer
    just add it as custom source
  • I tried that. But i get the Message: Sorry, not enough content to create any article. Try to select more sources or use more keywords.
  • Isnt it posssible to scrape without enter a keyword? i selected 129 sources for that. but it didnt extract that content fro mthe page
  • SvenSven
    can you paste some log lines when scraping your custom source site? Or paste some screenshot on how you configured it.
  • Hi @Sven

    I'm interested in this too.

    What exactly would be the process?  When I create a GSC project it asks me to enter at least one keyword.

    My process is as follows:

    I create a project
    I enter any keyword
    I add the source (s) and set them as extract
    In output I choose "same article" and in "number of words" I set it from 10-20000

    And the error it gives me is: sorry not enough content to create any article
  • i worked with entereing with some keywords, like words for a question: What can, what is, what about.

    Phrases for Homeimprovement
  • @przamunda ;

    "In output I choose "same article" and in "number of words" I set it from 10-20000 "

    I think 20000 is too much set it on 150-300 and test again
  • SvenSven
    Accepted Answer
    Algorithm: "Same Article"
    Number of Words: 1-100000 (to get all, long or short articles)
    Keyword: Use related keywords or at least some conman words like "and" "or" "a"...
  • przamundaprzamunda Madrid
    Accepted Answer
    I've tried it but it doesn't work very well. The workaround that is working for me is to scrape the sitemap from the page I want to use as source and use the extracted pages as sources.

    Maybe an idea would be to be able to add sitemaps as sources.
  • SvenSven
    Accepted Answer
    Well I don't know the structure of the site, but maybe it is having articles away on to many sublink-clicks?
  • przamundaprzamunda Madrid
    Accepted Answer
    @tanuki, thanks for your input :-)

    @Sven I DM them to you
  • henningnethenningnet Germany
    Sorry for digging out this old thread but I am trying to archive the same at the moment.

    The idea to have the sitemap as a custom source would be awesome. Just that GSA then scrapes the articles on the sitemap, does not leave the site and no keywords are needed. Just plain downloading articles. 
  • SvenSven
    Thats already working. GSA Content Generator can use your sitemap or rss-feed URL as source. It would go through that structure and extract link to parse them for articles.
  • henningnethenningnet Germany
    Ok cool, will try that out. As keywords I would put a list of keywords (lets say in German) that are included in every article ever and it would download them?

  • SvenSven
    yes that would work. Though you would still get "articles" that might not be related at all.
Sign In or Register to comment.