Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

A tiny lil function needed on serach_online_for_urls

I used a bit the GSA SER builtin scraper. With own se.dat it is possible to scrape and identify great amount of sites still. However, more and more care is needed per individual search engine currently and I can't run it optimally. Therefore this feature request

I need a small text option switch per engine in the se.dat lets say "custom_delay_seconds" that would cause to query the selected search engine only max every the number of seconds. I know the search_online_for_urls thread starts synchronously every XX seconds as per General/Submission option, so the custom option is to skip querying the relevant SE (per proxy) if the time in the text option is less then the time passed since the last query, per SE and proxy.
Hope I made it clear. To restate simply I have some SEs that needs to be queried every 180s, whereas most of them it is possible to query every even 50s:) The function would allow to skip firing queries to certain SE until 50+50+50+50=200>180s.

Apart from the above it would be wise to program stop querying the current phrase if we receive exactly the same SERP as the result of subsequent page queries. Like we get the same results on pages 14,15,16,17 this means that the pages finished, but the SE returns page one of results to fool us. Arithmetically this means no links found per page like 0/100results as seen in debug. Currently %pages% are capped (correct me if I am wrong) and the power is wasted to query till the limit with page one.

I hope this helps, If you are interested, I can provide more info and have some other ideas, but first things first.

Mit freundlichen Gruessen


Best Answer

  • SvenSven
    Accepted Answer
    in pm...any description will do


  • SvenSven
    a detection if same links have been extracted as previously is already added. But I guess I can optimize this. I need the custom se.dat for debugging.
  • Hello,
    My se.dat won't help you cause these entries point to servers that constitute another layer of abstraction:)  I will prepare some generic regular SE entries that allow to debug the issues. Will have to test them still. How do you want to have the file delivered?

Sign In or Register to comment.