Skip to content

Multiple problems = all related to target sites saved under: ... GSA Search Engine Ranker\projects

@sven

multiple problems = all related to target sites saved under:

C:\Users\User\AppData\Roaming\GSA Search Engine Ranker\projects

1.
the function "remove duplicate domains" seems NOT to work in this entire above folder
many duplicate domains in NEW_TARGETS - Files

may be an extension of the feature "remove duplicate ..." (URLs and Domains) to all NEW_TARGETS - Files ?


2.
I have on my main project 2 files:
NEW_TARGETS - Files
and
NEW_TARGETS2 - Files

the first target file however is FULL of Chinese text = several hundred KB
impossible to scroll ( too slow)
and surely NO HTTP protocol files (URLs)

can I delete that file NEW_TARGETS
and rename the NEW_TARGETS2 to NEW_TARGET ??



3.
when creating a NEW list of target URLs and importing it to a project or Tier,
there always is that small pop-up warning that the list first is used BEFORE NEW targets are used ...

that can take days (in my case for a target files of some 5000 URLs = typically 300kB)

is there a possibility to change priority of target URLs and take NEWEST first or to have a choice
or
manually rename NEW_TARGETS file to use newest FIRST ??


4.
another very important missing clarification is:

if I have a high quality target list with NEW unique URLs and want to use ONLY that target URL list for submissions ...

When is the NEW_TARGETS file used ?

in status > active
or in status ??
or a new status to create ??

how to enable NEW_TARGETS file only and disable all other URL sources (scraping, etc?

a status "global site list only" probably might be wrong (?) - it would MIX NEW targets with old existing targets = the purpose of
of unique NEW_TARGETS file list is to add new target URLs to a particular project or Tier

what happens to the successfully used / submitted / verified NEW_TARGETS file = are the URLs then available in the global site list for other projects ?

5.
Is there a way to FIRST  filter a NEW_TARGETS file list when importing new target URLs

the very same way it is done using Tools > Import URLs (identify platform ....) BEFORE saving target URLs into NEW_TARGETS file
to remove / filter all URLs that have no matching engine
Tagged:

Comments

  • SvenSven www.GSA-Online.de

    1. The duplicate url removal was never meant to be used for projects, just for site lists.

    2. yes you can delete that all. I don't know where the chinese text comes from though.

    3. No, thats not possible on how that whole system works right now.

    4. just disable all in project options (search engines and all options below that). Than only imported urls are used. They are used in priority though even if you have options checked there. imported sites are always used first (except is the active (global site lists only)).

    5. I don't see a point here. Let the project sort things out, else its double work (1st identify, later identify again to see what engine to use).

  • @sven
    thanks for all details
    all clear now

    though the remove duplicates  target urls mght be a useful feature and might be easy to add
    it saves LOTS of resources specially for those having limited bandwidth or paid dataplans

    I can export to SB dedup then re-import again to SER,
    some others specially in my coutries (3rd world) though have no money to purchase SB or use Linux for file processing using regex
  • SvenSven www.GSA-Online.de
    OK will think about it.
  • goonergooner SERLists.com
    edited September 2013
    @hans51 - Why not just use online de-dup tool and then import from clipboard into SER?
    Much quicker than import to sb, re-import to SER etc
  • @gooner
    with limited bandwidth
    online tools are totally out of range or at the expense of actual submission performance
    my bandwidth is FULLY occupied just for submissions and SB scraping
    zero surfing possible on that machine
    usually 22-33 threads
    using SB is lightening fast - using built-in SER dedup would be even faster and much more convenient

    computers are about automation and facilitation
    there is REAL life and real work outside all this computer/www work

    and for special clean up a clean up in my linux workstation using regex allows even better clean up never available online
  • goonergooner SERLists.com
    ah ok i didn't realise your bandwidth was limited, sorry about that.
  • SvenSven www.GSA-Online.de
    next version will have this added.
  • @sven :) tks for your  helpful work

    @gooner
    I am sure many other SER users are in similar or worst situation
    10+ yrs I have been working in the Philippine islands and such situations are default
    here in KH it is even worst and much more expensive to use faster 3G dataplans

    clean up lists has multiple advantages
    SER uses HUGE resources and normal laptops often run 50% +/- with frequent multiple phases at 99%
    every single obsolete computing operation that can be saved / preveted by clean up or dedup makes work more efficient and avoids time-out problems

    on a HIGH quality list (currently I have one such running) -  approx 2 hrs with 33 threads = 278 submissions = 321 verified
    in THIS precise situation i NO longer can load a single firefox pages of any articles submitted = simply time out = system fully loaded
    high % of submissions results in high UP-load traffic for articles and possible local images

  • goonergooner SERLists.com
    @hans51 - It's something i didn't think about before but next year i will probably be working from Bali, so possibly i will experience the same problems - I haven't researched about internet connections there yet.
  • @gooner
    as a general rule, SEA may be far ahead in mobile broadband coverage and bandwidth,
    my current place Cambodia may be one of the only exceptions in all asia or at least SEA / ASEAN countries, due to small size, too many ISP and low tech qualifications prices for 3G/4G are may be 10x the ones I enjoyed in the Philippines

    bali is just around the corner (from PH) - just in case you need to relocate
    may be you do some online research about mobile coverage before going there
  • goonergooner SERLists.com
    @hans51 - Thank you for the info. Yes i need to research for sure.
  • edited September 2013
    @gooner

    also consider the type of laptop you want to use in Bali
    = tropical heat same as HERE in KH = usually above limit of ambient operating temperature for consumer electronics (the limit is typically 35 degrees C - see user manual)

    either you need high performance fan
    or have to work strictly in aircon rooms with all health problems related

    the faster your quad CPU = the more heat your laptop develops
    the more DDR3 RAM you have = same as above

    until closure of my full size site may 2012, I had high end HP 8740W + 8GB DDR3 RAM + mid speed quad CPU
    and I had serious overheating problems even in aircon room with additional external fan
    because 35-40 degrees C is normal during certain months

    heat-crashes are the MOST serious computer situations because they destroy file-system / data because they are INSTANT = no saving / no journaling

    while low speed laptop either work without fan at all or produce less heat
    after my production work finished last year, NOW I use 3 acer aspire one = NO fan, LONG battery life fully tropics proof without aircon BUT slow like snail compared to high end work station laptop

    and for Internet connectivity

    you connect either via USB to 3G/4G OR via built-in Goby 3G/4G chipset
    or
    via your samsung mobile (android OS) with built-in wifi (I have dual SIM for more options)
    local wifi may OR NOT be available and working = you have to CREATE your own alternate solutions if www work important for you to earn livelihood

    and
    in tropics we have tiny ants going INTO laptop = INTO HDD to piss on your HDD (impossible if you have SSD) = result is destroyed NON-recoverable data carrier = simply ALL data destroyed because ants piss = formic acid = destroying instantly ferro-magnetic surface of HDD
    I had such loss yrs ago on a beautiful island in the Philippines, my local PC dealer meant that is frequent/normal in tropics ...

  • @sven

    I tested the new feature to dedup targets
    NOT sure if it really works because the answer "Remoed 0 duplicate URLs" is so fast = instant - less than 1 second
    and without progress bar as is in normal dedup option
    it seems impossible for SER to test a target file of some 539 KB for dups instantly

    later I may stop project and create a few duplicates to be 100% sure but I think something wrong now
    may be searching target files in wrong path or whatever
  • SvenSven www.GSA-Online.de
    Hmm worked fine for me. Imported the same targets (one URL) 3 times and 2 got removed afterwards.
  • @sven

    do you also have the small progress-bar-popup like in other dedup option?
    if so
    here NONE for dedup targets - just the INSTANT "Remoed 0 duplicate URLs"
    then may be searching in wrong path here

    will do some testing in a few hrs - now breakfast time before evening
  • @sven

    I tested again = 40 KB with 3 duplicate URLs
    still SER gives instant reply "Removed 0 duplicate URLs"

    may be NOT searching in path

    C:\Users\User\AppData\Roaming\GSA Search Engine Ranker\projects

    ??
  • SvenSven www.GSA-Online.de
    Is the project running? No progressbar is shown here though.
  • @sven
    YES running - I processed some 20'000 target sites with LpM <1 during the past 6 hrs

    but not all the time = I have to stop for maintenance and for scraping with SB

  • SvenSven www.GSA-Online.de
    OK that clear duplicates is not working if the project is running. You should do it when it's inactive.
  • OK thanks for clarification
    I always did on running project

    because until a few days ago the other option to dedup url or domain always worked on running project (now since 1-2 upgrades no longer).
  • @sven

    here still NOT working
    on my win7 OS (updated approx weekly)

    1. stopped all projects
    2. tested regular options > tools > dedup = ZERO dedup found
    3. rebooted machine > tools > dedup = 151670  dup removed (from 1 day work)
    4. added 4 dups in a target file then checked for dedup targets = "Removed 0 duplicate URLs"

    quiet sure that the reply much too fast (instant)

    as for the regular dedup in options > tools > dedup
    that was working even while projects running until about 1 or 2 upgrades ago (2-3 days ago), then stopped working
    yesterday already I noticed options > tools > dedup shows 0 dedups before reboot on stopped projects and 100;000+ dedups AFTER reboot
  • @sven

    still have the problem of MANY duplicate URLs in target cache

    1.
    the function function >import target URLs > remove duplicates seems NOT to work
    many duplicate URLs in > show urls > show left target urls
    usually groups of up to dozens or more absolutely identical URLs totaling thousands or up to 10+k URLs

    still an instant "removed 0 duplicate URLs" message
    instant = ZERO time = NO access to any folder or drive possible in that time

    above done ALL stopped = all inactive and also repeated after SER shut down and restarted
    = same result = instant "removed 0 duplicate URLs" message

    I just deleted ALL target url cache content for all projects and tiers because there where up to 10'000+ URLs in bunches of duplicates PER Tier or project accumulated

    the normal >options > tools > remove duplicate URLs = works perfectly

    ONLY the target URLs cache > show urls > show left target urls are with a growing number of duplicates and no way to remove those unless empty cache

    2.
    is there a way to delete duplicate URLs in the global sitelists > identified / submittedd / verified
    ??

    I thought this new function in the import target URLs is doing just that ?

    3.
    another problem somewhere mentioned much earlier is that in target url cache
    > show urls > show left target urls
    I find lots of URLs ending with a pipe "|" (of course only those who have PR = those from sitelist submitted or verified)

    P.S.:
    currently ONLY working with global site lists = ALL SE = OFF
    I have currently some 342'000+ URLs identified = enough for weeks to submit and adding new ones daily

    important
    my overall performance since several days is excellent and the submit to verified ratio most of the time 50-85%
    hence above problems with duplicates accumulating seems to have little or NO effect on SER verified submission performance
  • Got to agree with Hans51 about living in SE Asia. I was in Thailand for 4 years and without a VPS you're screwed. Internet goes off every 5 minutes during rainy season. Laptop heats up and dies (went through 5 laptops in 4 years, all different, all with cooling fans under them, makes no difference).

    Gooner, the difference between Thailand/UK and I suppose that goes for Bali and the UK, is monumental in terms of infrastructure. I moved back last year and haven't looked back. Earnings doubled, clients trust you if they can pick up the phone and call easily, clients trebled and no more headaches from screaming at lame Internet connections and power outages (all year round, not limited to rainy season). Running a laptop/PC in 40C is impossible. Having air con on all day costs more than winter gas and electric in the UK, all year round.

    Gimme a shout if you have any questions.
Sign In or Register to comment.