Skip to content

Huge Difference Between GSA SER and Platform Identifier - Need Advice

Hey guys

I have a site list that I’ve processed in both GSA SER and Platform Identifier. With GSA SER, I got close to 3 million identified, while with Platform Identifier, I only got 712k.

The thing is, GSA SER takes me 7–10 days to process this list, while Platform Identifier usually takes 1–2 days with 5000 threads. This is really driving me mad because I can’t wait 10 days just to sort a list that's why I bought PI.

My question is: why is there such a huge difference? I would expect the results to be at least close, not off by millions. Which one should I trust more?

Any insights would be appreciated.







Comments

  • SvenSven www.GSA-Online.de
    SER sorts it in to all matching engines where PI is only adding it to the most closest.
    I have already revisited the code to optimise it, but I really can not find any bottleneck.
  • edited November 15
    Hi @Sven

    I just tested GSA Platform Identifier again using the same settings as above, this time with 1000 threads, and the results are only slightly higher — from around 13% to 16% identified.

    However, I noticed something strange:

    The status bar says 100% finished, but PI keeps running for 5+ hours without actually processing any new URLs. It just sits there. Is this a bug?

    At this point, I’m not sure how to proceed anymore.


    I can’t wait for SER to process lists for 7–10 days, but PI gives far fewer results. And I really don’t want to import the raw list directly into SER and waste time/resources processing engines it can’t even post to.

    But right now it feels like I have no choice because PI isn’t finishing properly and SER takes way too long.

    Any advice on how to handle this would be appreciated.



  • SvenSven www.GSA-Online.de
    I will revisit the code next week and see if there is some more room to optimise things in SER.
  • royalmiceroyalmice WEBSITE: ---> https://asiavirtualsolutions.com | SKYPE:---> asiavirtualsolutions
    @Anonymous


    Can you try with the settings below, with 5000 or even 1000 - You are just causing bottlenecks.

    Also, make sure to export the engines from GSA SER and paste them into the engines folder of GSA PI - overwriting existing. GSA SER engines are updated more often than GSA PI, and if you use SERNUKE, their engines are not in GSA PI by default. You will find the export in GSA SER under the Advance Tab \ Misc. (See screenshot below.)







    Thanked by 1Anonymous
  • googlealchemistgooglealchemist Anywhere I want
    edited November 21
    Sven said:
    SER sorts it in to all matching engines where PI is only adding it to the most closest.
    I have already revisited the code to optimise it, but I really can not find any bottleneck.
    i wasnt aware of this difference. what was the reasoning behind making it like this? if we get an identifcation in pi for a wordpress article site, or a wordpress blog comment...we would miss one or the other? or a forum that does public topic posts and contextual profiles, we would miss one or the other?

    i was just re reading a related post and @sickseo was saying he uses it, and it works, for this purpose unless im misunderstanding one or both of you? https://forum.gsa-online.de/discussion/comment/196849/#Comment_196849
  • edited November 27
    @Sven

    My Test Results

    I ran a similar set of tests on the same list and here’s what I got:

    • 5000 threads → ~13% success rate
    • 1000 threads → ~16% success rate
    • 100 threads → ~20% success rate (but it took almost 3 weeks to finish)

    From what I can see, lowering the thread count improves stability and parsing accuracy, but the trade‑off is speed. I also noticed that enabling the bandwidth limiter probably held back a few extra % points  without it, I might have squeezed out a bit more.

    Still, the numbers are far below what I used to get with GSA SER (2.5M identified) and GSA Platform Identifier (1.1M identified). I’ll try again by processing an export from GSA SER as @royalmice   suggested, and see if that improves the results.

  • royalmiceroyalmice WEBSITE: ---> https://asiavirtualsolutions.com | SKYPE:---> asiavirtualsolutions
    @Anonymous

    Some additional tips

    1. Don't use proxies - they will slow you down.
    2. Be sure to exclude the GSA PI from your Windows Defender and Malware scanner so it does not scan every connection, and save the results. (Both GSA PI in program files as well as GSA PI in Appdata folder).
    3. Pre-scan the list before sorting it, and remove duplicate URLs.

  • edited November 27
    royalmice said:
    @Anonymous

    Some additional tips

    1. Don't use proxies - they will slow you down.
    2. Be sure to exclude the GSA PI from your Windows Defender and Malware scanner so it does not scan every connection, and save the results. (Both GSA PI in program files as well as GSA PI in Appdata folder).
    3. Pre-scan the list before sorting it, and remove duplicate URLs.

    Got it, thanks sir, appreciate it! 

    1. I don’t use proxies with GSA PI.
    2. I’ve already disabled both the virus scanner and firewall on my VPS, so that part is covered.
    3. When working with raw URL lists, I’m wondering about the best approach to duplicate removal.

      • Since these are raw URLs, we don’t yet know if they’re even postable. Even if identified, some may not have a registration or submission form.
      • If we process, say, 1000 unique URLs from one domain, many of them might not be usable, which feels like wasted time.

       Wouldn’t it be better at the raw processing stage to remove duplicates on the domain level instead of just the URL level? That way we avoid spending resources on multiple variations from the same domain that may not even support posting.

    One thing I’m curious about I noticed you mentioned in picture above deselecting the “wild matching” and “deep matching” options in GSA PI. Wouldn’t those improve the identification rate, or do they just waste resources and slow things down? I’m wondering if it’s better to leave them off for efficiency, or if they actually help in certain cases?





  • Following...I don't use gsa ponofyen as for me it's cheap and much affordable to buy few link list service and just use there.however whenever I want to merge those my verified lists and extract for outbound links to expand the list I use gsa pi and as you said even gsa ser slow it always endup with better harvest than gsa pi. Like royalmice said copy gsa engines to pi works..I used to direct copy to folder never knew there is a direct option in gsa ser 
Sign In or Register to comment.