Skip to content

Could somebody please explain a few things from this screenshot?

edited January 2014 in Need Help
Recently I scraped a few million URLs, set up 5 projects and imported around 2 million links into each. But when I look at the logs throughout the day, I see the same sites repeating over and over (please see screen below). Already deduped all of the lists, GSA only found around 1k dupes in each but this stuff still shows up:

image
A few questions regarding the above:

1. At the beginning we have 2 brackets [ ], sometimes empty, sometimes a "-" sometimes a "+" inside. What does each of these stand for?

2. At first I thought that "0685/7874" means that currently, the project is trying to post to site #685 out of all those that were imported. Looking at the picture, that obviously doesn't make sense since there are like 4-5 different domains that all have "0685". So what do these 2 numbers stand for?

3. As mentioned above, the same sites you see in the screenshot show up over and over again...why is GSA trying to post to these over and over? I already deduped the list and disabled "continuously try to post to a site even if failed before", yet a few hours later the log is still full of this and I fear this is wasting a ton of time and resources trying to post to the same sites.

Hope @Sven or someone else can shed some light on this.

Comments

  • do you have 'allow posting to the same site again' ticked in options?
  • As I mentioned in question #3, I have disabled "continuously try to post to a site even if failed before", which I believe is what you mean? So that can't be it.
  • No that isnt what I mean they are two different things. Look further down the list of options.
  • edited January 2014
    Oh you mean scheduled posting? No, that's also disabled for all these projects.

    Unfortunately can't edit OP anymore but for some reason I enabled the sitelist "identified" (about 30 mins ago) and look what's there:
    image
    These are obviously not official engines (hence "unknown"), yet GSA seems to think they are (according to the log: "matches engine opera.com", lol)...though that still wouldn't explain why it tries to post over and over again to the same URL. Not sure, maybe it's still helpful in figuring out the issue.
  • edited January 2014
    why do you keep thinking i am talking aobut things otehr than i stated. I made sure the wording was exactly as it is in the gsa project options to avoid confusion :D half way down the list by the saarch engines bit.
  • edited January 2014
    I'm pretty sure this time we were talking about the same thing, I just worded it badly:

    image

    Anyway, it's disabled, problem still there though.

    Anybody else?
  • Small update...this is getting crazy. I have this across all my campaigns now. All of them stuck in trying to post to opera.com and netlog.com (and in between a few submissions here and there to actual engines).

    image

    I don't get it, these aren't even officially integrated engines, so why is SER saying "matching engine netlog.com" when there is none? Almost every URL/domain of my scraped list is being identified as either opera or netlog...

    My understanding is obviously limited so I'd really appreciate some help with this.
  • SvenSven www.GSA-Online.de
    @johnmiller did you modify those engines in any way?
  • @Sven Nope, I didn't change anything inside GSA or the engine files :(
  • SvenSven www.GSA-Online.de
    did you order the serengines.com engines? If so, did the order expire?
  • edited January 2014
    @Sven No, not a member of serengines. Do you think if I unchecked serengines in the projects this opera/netlog stuff would stop?

    Since this started my LPM is down from 80 to 15...
    image

    Edit: I see netlog.com and opera.com are actual engines inside the web 2.0's ("where to submit") - I'll uncheck these for now hoping that it fixes it. However, why does it then say "unknown" for these engines if they are officially inside GSA? And is it normal that GSA thinks for hundreds of thousands of URLs that they could be opera/netlog?
  • You have "Continuously try to post to a site..."
  • edited January 2014
    As mentioned in the first post of this thread (question #3), I have "Continuously try to post to a site..." disabled.
  • SvenSven www.GSA-Online.de
    @johnmiller I have no idea why this happens to you. Maybe the engine files are damaged and accepting now every site. could you get me access to that system somehow (pm then)?
  • edited January 2014
    Well, I have now deactivated opera.com and netlog.com from the web 2.0's, since then the problem is gone and lpm went up again. In case the engine files are broken I assume by downloading the next SER update they'll be fixed automatically again?
  • I am still also curious about his #1 and #2 I always have been. There are things with this software that go completely unexplained
  • Thanks, tsaimllc. Maybe @Sven would be so kind to enlighten us?
  • SvenSven www.GSA-Online.de
    @johnmiller just try to install the current verion again and see if that brings back the problem. If not, than your engine files must have been somehow different than mine.
  • @Sven will do, thanks. Could you maybe answer question #1 and #2 I asked in OP? That would be awesome.
  • SvenSven www.GSA-Online.de
    >1. At the beginning we have 2 brackets [ ], sometimes empty, sometimes a "-" sometimes a "+" inside. What does each of these stand for?

    This means either a successful/positive message (+) or a negative one (-). There are also [!] for attention or [ ] for neutral.

    >2. At first I thought that "0685/7874" means that currently, the project is trying to post to site #685 out of all those that were imported. Looking at the picture, that obviously doesn't make sense since there are like 4-5 different domains that all have "0685". So what do these 2 numbers stand for?

    This means that the same URL is matching a couple of engines. I guess thats the problem you got as well. So it means site 0685 out of 7874 it is working on.

Sign In or Register to comment.