Skip to content

Questions About Filtering Regarding Imported Scraped Target URLs Into a Project

I have a scraped list of URLs which are all relevant to my keywords and those are the only URLs I want to use to get links. I right clicked on my project and imported target urls from the file I had all the scraped URLs on. I've unchecked every single "search engines to use" engine, unchecked the "use URLs from global site lists if enabled", and unchecked the "analyze and post to competitor's backlinks" box. I think I did that correctly to ensure I'm only using my imported URLs to create links. Let me know if I forgot anything.

ANYWAY, I have 3 questions about how GSA SER interprets those URLs after I've imported them:

1 - I'm not sure what kinds of platforms/engines the scraped URLs are/extend to, so should I just go ahead and check every single platform under the "where to submit" main interface in the project and subsequently fill out every single box, basically so that just in case I have any URLs which are that type, GSA SER will have that info? GSA SER figures out what platforms these URLs are as it goes, correct? I've never imported a URL list so I have no idea.

2 - If I only want to post to PR 1 and above for my Tier 1 campaign (which this project is), can I go into the "Filter URLs" section of Options and set that accordingly and it will go ahead and skip the PR 0 and N/A URLs? If so, what does it do with the PR 0 and N/A URLs? Will it leave them in the "remaining target URL's" so that I can go ahead and use them for the second tier?

3 - Lastly, this is a bit less related but when I scrape URLs via Scrapebox to give to GSA SER, should I trim them to root or leave them as the full URL before I import them to GSA SER? I don't know if I'm ruining GSA SER's ability to create a link if I trim it to root... I really don't know how GSA SER functions in creating that link. Any insight here or on any of these questions would be huge.

Thanks so much fellow GSA SERers!

Comments

  • s4nt0ss4nt0s Houston, Texas
    1) You will need to have the platform selected for it to detect the engine so yes, check the engines you want to use and if a URL fits that engine, SER will attempt to post to it. I would uncheck platforms, exploit, indexer, referrer, url shortner, and web 2.0's. 

    Keep in mind SER does have an identify and sort in feature if you click the global options button > advanced tab > tools button > Import URLS (identify and sort in). This will identify and sort your URL list and by default the URLs are saved in your global site lists identified folder but you can also save them a custom file if needed. 

    So you can either import your URLs directly into a project or presort them, its up to you.

    2) Yes, it will skip the the URLS under PR1 if set that filter. Don't forget to check the option "skip also unknown PR". It's not going to leave the remaining target URLS so you might need to reimport in your other project with different filters.

    3) Well SER has the ability to try and find the login/signup page for platforms, but I personally wouldn't trim the URL to root. When you use a custom footprint, Google is returning the URL that meets that footprint so I don't see any need to trim to root for importing in SER.
  • Awesome thorough and quick response, s4nt0s; I really appreciate it. A couple of quick follow ups if you don't mind.

    1A - What would be the reasoning behind unchecking platforms like url shortner and web 2.0's? Are they generally spammy and best kept for a lower tier?

    1B - My reasoning for wanting to import directly into the project vs. "identify and sort in" is that if I import directly into the project then I'll have full control that I'm just building links from that list, whereas if I "identify and sort in", then I'll have to use the global "identify" list in my project to get those links, but at the same time I'll be SIMULTANEOUSLY getting other identified targets which might be unrelated/from other keywords which already existed in the global identify list. Am I correct with this thought process? So basically if I import directly into the project, there won't be any mixing with other identified global targets for that project.

    I just saw that you said "you can also save them a custom file if needed" which probably means you get what I'm saying. So does that include sorting them first then letting you save to a custom file? How do I save them as a custom file out of curiosity?

    2 - Got it, basically just import the exact same scraped list file into my second tier but with a filter to ONLY build links from URLs with PR 0 or N/A.

    3 - Thanks for the info, I just had a thought though. I guess if I trim to root, GSA SER will get the PR of the domain as a whole whereas if I leave the URLs as is, it'll collect the PR of that specific page on the domain, correct? That would give drastically different results if I was using the PR filter in my project, wouldn't it? That's something to think about if that's the case.

    Thanks again!
  • s4nt0ss4nt0s Houston, Texas
    1) Well for first tier most people do go with contextual type platforms like web 2's or blog platforms, etc. For me personally, I go with a range of parasites like youtube/web 2.0/QA sites/social platforms etc. Of course those can't really be made in SER, but I do mix in other contextual SER platforms as well.

    You should have a plan of what you're trying to do, then you create your project based around that unless you're just going for MASS links, then by all means select all the platforms, otherwise plan accordingly, pick the platforms you want for that tier, then import the list. 


    2) By default the sorted links will be saved into the identified folder and if you do have any previous links in that folder from past projects, they will be used. If you don't want to do it like that you can choose the "save to custom file" option. It will identify the platforms you want and save them all into a single .txt file, you can then import it directly into a project. To save them as a custom file, you just select that option lol. You'll see it when you go to the identify and sort in and set it up.

    3) Well you could just set the filter to skip sites above PR1 so you don't hit the same sites as the first tier

    4) In the filter URL's section look at the box that says "use PR of domain". You can click that and change it to where you want it to pull the PR from. Root domain, subdomain, or page.

  • Great info. I'm going to try saving them all into a single .txt file, as a custom. What's the difference between ticking "use engine filter" or not out of curiosity? Won't it sort them either way?

    Otherwise:

    1 - Ah I got you, yeah I was blanking but yes it's simply an issue of quality. Good call.

    2 - And there it is; I never tried doing the "identify platform and sort in" function. I guess that's probably the way to do it if I want to identify the links ahead of time to know what they are.

    3 - True, that's another way of doing it. Good call.

    4 - Totally forgot about that feature; that's exactly what I need. I can leave the URLs as is then decide in SER which PR I want to look at, great.

    Thanks so much for the clarity on these questions!
  • s4nt0ss4nt0s Houston, Texas
    The engine filter allows you to filter for a specific type of platform. For example, if you only wanted wordpress article sites you could import your raw scraped list, uncheck everything but the wordpress article platform and its only going to save the wordpress article URLS and nothing else. It just allows you to save a specific platform.

    Keep in mind you don't have to use identify and sort-in if you don't want to. You can just import them directly into your project. Do whatever works best for you :)
  • Much thanks, I'm still feeling my way through the software so this really helps.
  • Probably a dumb question, but after I've sorted the list to the two custom files (one identified, one unknown), it basically can't do anything with the unknown URLs, correct?

    I was about to say it'd be great if there was a tool to split them all up by category but then I saw the GSA Pi. Good tool, I'm thinking about getting it if I keep up with scraping my own links.
  • s4nt0ss4nt0s Houston, Texas
    Well, you can always take the unknown and run it through again to see if it picks up anything that it missed in the previous run. It's up to you if you want to spend the time doing that though. 

    The returns might be minimal but its up to you.

    The unknown are URLS that SER isn't capable of posting to so ya they're pretty much useless. 

  • Is there a way to demo Platform Identifier, by the way? I'm not sure if I need it or not.

    I've spent the last couple weeks harvesting lots of URLs using footprints from both GSA SER and elsewhere online and want to create links using GSA SER to my sites using these harvested URLs, but they're obviously all mixed together and not sorted by platform. I guess I don't know if GSA PI is necessary or if I can just tick off the platforms I want to create links with by using GSA SER, load in my thousands of unsorted URLs, and GSA SER will just figure out which URLs will work and which won't.

    Any insight on this would be great, thanks!


  • s4nt0ss4nt0s Houston, Texas
    Well SER has built in identification already so you could just import the list and SER will detect the engines that you have selected and post to them automatically.
  • That's what I was thinking. Sounds good. What do people use Platform Identifier for then if GSA SER has the ability to detect the platform on its own? Is it anything more than simply being able to sort a huge list by platform, meaning what are the practical applications? Much thanks as always s4nt0s.
Sign In or Register to comment.