Skip to content

Resubmitting URL's

AlexRAlexR Cape Town
edited September 2012 in Need Help
I save submitted, verified and identified lists. 

I have two questions:
1) As the program gets more engines added how do I get it to check the old lists and see if it can identify the new engines in those?
2) Now that CS is far better, how do I get it to resubmit all URL's that failed due to capcha issues? (No point having to go and find new URL's when we have a very big list that we could first go at)

Thanks!
Tagged:

Comments

  • 1) GSA save only the URLs it could identify at given time. If it has found "oxwall" urls two weeks ago it would have dumped that url. If it finds the URL now it will identify it as oxwall.
    If you want to re-identify them, just add your lists to the identifier tool, but I don't know why you want to do that?
    Or do you mean your old scrapebox lists? You could use the identifier tool for them, too (avanced -> tools)
    2) I can't answer that, but that would be a nice feature. the problem is that these sites are marked as "already parsed", if I'm not wrong. You have to wait till monday to get this question answered by Sven.
  • @Ozz: About your 2) If you look into your project folder there is a file for the project called [project name].processed. I don't want to tell you wrong stuff, but I think this is the base for the decision if the host is already parsed or not.
  • I think this feature to have a chance to use the already parsed list would be great. Sometimes, if I'm not wrong, you might get "download failed" and the site becomes "already parsed", but maybe it was just a bad proxy and the site could be used. So accessing the list and maybe choosing what to do without erasing it completely would be good.
  • To have the possibility to retry failed sites would be great. This would be another "idea" for workless threads.
  • AlexRAlexR Cape Town
    Trying to catch up on all forum posts for the last week. Is it just me or are there a lot more members? I think I have read every post to date since the start of the forum...but the amount is increasing exponentially in the last week or so!

    I think that Sven said that you can clear target history and then it will try them again. (Been reading for about 3 hours now to catch up) Correct me if I have misunderstood this. 
  • AlexRAlexR Cape Town
    @Ozz & @Bytefaker - This is very useful...found the thread... 


    Delete History - Click this if you..

    a) want to submit to sites that you previously skipped (e.g. you didn't want to enter captchas back than but want to do now)

    b) think it found sites in the past that it skipped but are now supported by e.g. new added engines

    c) think it found sites in the past and somehow your internet connection blocked it to get sent to (proxy issue e.g.)


    This is super useful and think it should be done every 6 weeks. What do you guys think?

  • I've done that once lately for all projects and it worked well. Which time range you use for this is very user dependend because everyone is feeding their lists with different speed or use different captcha solving services etc.

    In general I would say that it is not a bad idea to do this whenever a new captchas are added to CS, if you only use this as solving service.


  • AlexRAlexR Cape Town
    @sven - does delete history remove all irks and it has to respider them or does it just target failed submissions?
  • @sven - does delete history remove all irks and it has to respider them or does it just target failed submissions?

    +1 - wanted to know the same thing!
  • It also would be useful to know a possibility to retry these sites without double posting to the same domain twice.
  • AlexRAlexR Cape Town
    This has been updated. In case anyone comes across this thread, check this thread out https://forum.gsa-online.de/discussion/570/saving-sites-that-failed-or-weren039t-submitted#Item_8 It has not been added to GSA.
Sign In or Register to comment.