Skip to content

I've bought Scrape genie, how do i parse specific website types?

For example i want to parse contact forms, how do i do that? Do i need to create a custom parser using commands, or there is easier way to do that?

Comments

  • SvenSven www.GSA-Online.de
    Well, add e.g. "contact" in filters for "URL must have" and also in "anchor texts". Then let is parse sublinks to at least 1 level deep and restrict things to the domain. That should basically be it.
    Thanked by 1Ak1RA51
  • Scrape Genie is great software! Your can have many projects running at the same time doing a wide variaty of tasks - and see that high level view of each project from front GUI. It's Next level for a Scraper, but calling it a just Scraper would also not be doing it justice!

    There's already some templates and plugins to start and new useful features added frequently! Plus you can define your own like you said. I have a Contact form project and it's similar to what Sven suggested. 
    Thanked by 1Ak1RA51
  • edited September 4
    Sven said:
    Well, add e.g. "contact" in filters for "URL must have" and also in "anchor texts". Then let is parse sublinks to at least 1 level deep and restrict things to the domain. That should basically be it.
    I tried this but getting no results, could you do video on this maybe if poss? I am sure I am doing it wrong
    I want to scrape contact forms also
  • Well a must have filter for "contact" in the url would help.

    Also maybe creating a must have filter html-source for and ID of a form like wpcf7.

    Then import some niche keywords and related search engines.

    If done above for this project you'd get domain.{tld}/contact URL's and also be checking for the wordpress contact form 7 id wpcf7 in the page source. You'd probably have to find these identifiers yourself for specific forms. Inspect page source and find form then an ID.

    Parse sublink 1 level should grab the contact url if its there on page 1 and restricting to domain should only parse url from target domain finding the contact url. 


    It takes a bit to get used to because its doing it all these things in one project and if your used to ScrapeBox this would be a mulitpart workflow. I think this is what makes Scrape Genie a great tool! Theres no limit on projects and can get vary granular with what your scraping, exporting, sending etc

    When you make a filter list also be aware of = checking AND or OR filter selection. 

    Really could write a book on this topic. But maybe this is some help? 

    This will create content must have filter for "contact" in URL.



    Be aware of the "List Type" as if you select AND then you'll get less results as they must match All the filters. You can add as many filters as you want. Your own or from templates.




    This is looking for a form id in the html source. Best to start small and slectively craft it as you go. IMO

    Then you have a template of your own to reuse just pumping out the data you need constantly from as many projects as you wish all with there own end goals.

    Scrape Genie - It's quite complex, feature rich, fully automated and very nice tool 
    ;) 



    Thanked by 1daverawcus
  • do i needed to add anything in the parser tab?

    Thanks for the help by the way :) 
  • SvenSven www.GSA-Online.de
    Next update will offer you to use project templates where I have added one that will scrape for contact pages.
  • daverawcus I don't believe you need to add a parser for this example, but you will get the message like "You selected no parser do you want to continue" GSA just really good with the double checking messages! 

    I just double checked I had an unrelated parser for X ids in the project so I must been playing around. 

    But if sven makes template then guess we can just use that  :)
  • yeah I am defo doing something wrong

    copied all the settings above, no parser set up 

    I can see the scrape pages in the queue and then when I parse I can see some contact form URLS in the queue, but nothing goes in the results tab 


  • edited September 5
    Sven said:
    Next update will offer you to use project templates where I have added one that will scrape for contact pages.
    Any rough idea when this will be? just deciding if to struggle on (I think i am close) or just wait for the template:smile:
  • SvenSven www.GSA-Online.de
    It's already out.
    Thanked by 1daverawcus
  •  I can see some contact form URLS in the queue

    So then those are your contact URL's it found and its working? Unless you want to do something further like parse something else and or send somewhere I don't think you will see more as your just scraping for contat forms here. I think maybe the confusion is its doing all the steps all in one project.  

    Maybe you could add step to auto export to a file to be used elsewhere?

    Or am I missing something?

  • edited September 5
     I can see some contact form URLS in the queue

    So then those are your contact URL's it found and its working? Unless you want to do something further like parse something else and or send somewhere I don't think you will see more as your just scraping for contat forms here. I think maybe the confusion is its doing all the steps all in one project.  

    Maybe you could add step to auto export to a file to be used elsewhere?

    Or am I missing something?

    The templates puts them in results now, the template parses the page title, probably what i was missing :)
  • Sweet, I'll have to update and check.  :)
Sign In or Register to comment.