Multiple problems = all related to target sites saved under: ... GSA Search Engine Ranker\projects
@sven
multiple problems = all related to target sites saved under:
C:\Users\User\AppData\Roaming\GSA Search Engine Ranker\projects
1.
the function "remove duplicate domains" seems NOT to work in this entire above folder
many duplicate domains in NEW_TARGETS - Files
may be an extension of the feature "remove duplicate ..." (URLs and Domains) to all NEW_TARGETS - Files ?
2.
I have on my main project 2 files:
NEW_TARGETS - Files
and
NEW_TARGETS2 - Files
the first target file however is FULL of Chinese text = several hundred KB
impossible to scroll ( too slow)
and surely NO HTTP protocol files (URLs)
can I delete that file NEW_TARGETS
and rename the NEW_TARGETS2 to NEW_TARGET ??
3.
when creating a NEW list of target URLs and importing it to a project or Tier,
there always is that small pop-up warning that the list first is used BEFORE NEW targets are used ...
that can take days (in my case for a target files of some 5000 URLs = typically 300kB)
is there a possibility to change priority of target URLs and take NEWEST first or to have a choice
or
manually rename NEW_TARGETS file to use newest FIRST ??
4.
another very important missing clarification is:
if I have a high quality target list with NEW unique URLs and want to use ONLY that target URL list for submissions ...
When is the NEW_TARGETS file used ?
in status > active
or in status ??
or a new status to create ??
how to enable NEW_TARGETS file only and disable all other URL sources (scraping, etc?
a status "global site list only" probably might be wrong (?) - it would MIX NEW targets with old existing targets = the purpose of
of unique NEW_TARGETS file list is to add new target URLs to a particular project or Tier
what happens to the successfully used / submitted / verified NEW_TARGETS file = are the URLs then available in the global site list for other projects ?
5.
Is there a way to FIRST filter a NEW_TARGETS file list when importing new target URLs
the very same way it is done using Tools > Import URLs (identify platform ....) BEFORE saving target URLs into NEW_TARGETS file
to remove / filter all URLs that have no matching engine
multiple problems = all related to target sites saved under:
C:\Users\User\AppData\Roaming\GSA Search Engine Ranker\projects
1.
the function "remove duplicate domains" seems NOT to work in this entire above folder
many duplicate domains in NEW_TARGETS - Files
may be an extension of the feature "remove duplicate ..." (URLs and Domains) to all NEW_TARGETS - Files ?
2.
I have on my main project 2 files:
NEW_TARGETS - Files
and
NEW_TARGETS2 - Files
the first target file however is FULL of Chinese text = several hundred KB
impossible to scroll ( too slow)
and surely NO HTTP protocol files (URLs)
can I delete that file NEW_TARGETS
and rename the NEW_TARGETS2 to NEW_TARGET ??
3.
when creating a NEW list of target URLs and importing it to a project or Tier,
there always is that small pop-up warning that the list first is used BEFORE NEW targets are used ...
that can take days (in my case for a target files of some 5000 URLs = typically 300kB)
is there a possibility to change priority of target URLs and take NEWEST first or to have a choice
or
manually rename NEW_TARGETS file to use newest FIRST ??
4.
another very important missing clarification is:
if I have a high quality target list with NEW unique URLs and want to use ONLY that target URL list for submissions ...
When is the NEW_TARGETS file used ?
in status > active
or in status ??
or a new status to create ??
how to enable NEW_TARGETS file only and disable all other URL sources (scraping, etc?
a status "global site list only" probably might be wrong (?) - it would MIX NEW targets with old existing targets = the purpose of
of unique NEW_TARGETS file list is to add new target URLs to a particular project or Tier
what happens to the successfully used / submitted / verified NEW_TARGETS file = are the URLs then available in the global site list for other projects ?
5.
Is there a way to FIRST filter a NEW_TARGETS file list when importing new target URLs
the very same way it is done using Tools > Import URLs (identify platform ....) BEFORE saving target URLs into NEW_TARGETS file
to remove / filter all URLs that have no matching engine
Tagged:
Comments
1. The duplicate url removal was never meant to be used for projects, just for site lists.
2. yes you can delete that all. I don't know where the chinese text comes from though.
3. No, thats not possible on how that whole system works right now.
4. just disable all in project options (search engines and all options below that). Than only imported urls are used. They are used in priority though even if you have options checked there. imported sites are always used first (except is the active (global site lists only)).
5. I don't see a point here. Let the project sort things out, else its double work (1st identify, later identify again to see what engine to use).
thanks for all details
all clear now
though the remove duplicates target urls mght be a useful feature and might be easy to add
it saves LOTS of resources specially for those having limited bandwidth or paid dataplans
I can export to SB dedup then re-import again to SER,
some others specially in my coutries (3rd world) though have no money to purchase SB or use Linux for file processing using regex
Much quicker than import to sb, re-import to SER etc
with limited bandwidth
online tools are totally out of range or at the expense of actual submission performance
my bandwidth is FULLY occupied just for submissions and SB scraping
zero surfing possible on that machine
usually 22-33 threads
using SB is lightening fast - using built-in SER dedup would be even faster and much more convenient
computers are about automation and facilitation
there is REAL life and real work outside all this computer/www work
and for special clean up a clean up in my linux workstation using regex allows even better clean up never available online
@gooner
I am sure many other SER users are in similar or worst situation
10+ yrs I have been working in the Philippine islands and such situations are default
here in KH it is even worst and much more expensive to use faster 3G dataplans
clean up lists has multiple advantages
SER uses HUGE resources and normal laptops often run 50% +/- with frequent multiple phases at 99%
every single obsolete computing operation that can be saved / preveted by clean up or dedup makes work more efficient and avoids time-out problems
on a HIGH quality list (currently I have one such running) - approx 2 hrs with 33 threads = 278 submissions = 321 verified
in THIS precise situation i NO longer can load a single firefox pages of any articles submitted = simply time out = system fully loaded
high % of submissions results in high UP-load traffic for articles and possible local images
as a general rule, SEA may be far ahead in mobile broadband coverage and bandwidth,
my current place Cambodia may be one of the only exceptions in all asia or at least SEA / ASEAN countries, due to small size, too many ISP and low tech qualifications prices for 3G/4G are may be 10x the ones I enjoyed in the Philippines
bali is just around the corner (from PH) - just in case you need to relocate
may be you do some online research about mobile coverage before going there
also consider the type of laptop you want to use in Bali
= tropical heat same as HERE in KH = usually above limit of ambient operating temperature for consumer electronics (the limit is typically 35 degrees C - see user manual)
either you need high performance fan
or have to work strictly in aircon rooms with all health problems related
the faster your quad CPU = the more heat your laptop develops
the more DDR3 RAM you have = same as above
until closure of my full size site may 2012, I had high end HP 8740W + 8GB DDR3 RAM + mid speed quad CPU
and I had serious overheating problems even in aircon room with additional external fan
because 35-40 degrees C is normal during certain months
heat-crashes are the MOST serious computer situations because they destroy file-system / data because they are INSTANT = no saving / no journaling
while low speed laptop either work without fan at all or produce less heat
after my production work finished last year, NOW I use 3 acer aspire one = NO fan, LONG battery life fully tropics proof without aircon BUT slow like snail compared to high end work station laptop
and for Internet connectivity
you connect either via USB to 3G/4G OR via built-in Goby 3G/4G chipset
or
via your samsung mobile (android OS) with built-in wifi (I have dual SIM for more options)
local wifi may OR NOT be available and working = you have to CREATE your own alternate solutions if www work important for you to earn livelihood
and
in tropics we have tiny ants going INTO laptop = INTO HDD to piss on your HDD (impossible if you have SSD) = result is destroyed NON-recoverable data carrier = simply ALL data destroyed because ants piss = formic acid = destroying instantly ferro-magnetic surface of HDD
I had such loss yrs ago on a beautiful island in the Philippines, my local PC dealer meant that is frequent/normal in tropics ...
I tested the new feature to dedup targets
NOT sure if it really works because the answer "Remoed 0 duplicate URLs" is so fast = instant - less than 1 second
and without progress bar as is in normal dedup option
it seems impossible for SER to test a target file of some 539 KB for dups instantly
later I may stop project and create a few duplicates to be 100% sure but I think something wrong now
may be searching target files in wrong path or whatever
do you also have the small progress-bar-popup like in other dedup option?
if so
here NONE for dedup targets - just the INSTANT "Remoed 0 duplicate URLs"
then may be searching in wrong path here
will do some testing in a few hrs - now breakfast time before evening
I tested again = 40 KB with 3 duplicate URLs
still SER gives instant reply "Removed 0 duplicate URLs"
may be NOT searching in path
C:\Users\User\AppData\Roaming\GSA Search Engine Ranker\projects
??
YES running - I processed some 20'000 target sites with LpM <1 during the past 6 hrs
but not all the time = I have to stop for maintenance and for scraping with SB
I always did on running project
because until a few days ago the other option to dedup url or domain always worked on running project (now since 1-2 upgrades no longer).
here still NOT working
on my win7 OS (updated approx weekly)
1. stopped all projects
2. tested regular options > tools > dedup = ZERO dedup found
3. rebooted machine > tools > dedup = 151670 dup removed (from 1 day work)
4. added 4 dups in a target file then checked for dedup targets = "Removed 0 duplicate URLs"
quiet sure that the reply much too fast (instant)
as for the regular dedup in options > tools > dedup
that was working even while projects running until about 1 or 2 upgrades ago (2-3 days ago), then stopped working
yesterday already I noticed options > tools > dedup shows 0 dedups before reboot on stopped projects and 100;000+ dedups AFTER reboot
still have the problem of MANY duplicate URLs in target cache
1.
the function function >import target URLs > remove duplicates seems NOT to work
many duplicate URLs in > show urls > show left target urls
usually groups of up to dozens or more absolutely identical URLs totaling thousands or up to 10+k URLs
still an instant "removed 0 duplicate URLs" message
instant = ZERO time = NO access to any folder or drive possible in that time
above done ALL stopped = all inactive and also repeated after SER shut down and restarted
= same result = instant "removed 0 duplicate URLs" message
I just deleted ALL target url cache content for all projects and tiers because there where up to 10'000+ URLs in bunches of duplicates PER Tier or project accumulated
the normal >options > tools > remove duplicate URLs = works perfectly
ONLY the target URLs cache > show urls > show left target urls are with a growing number of duplicates and no way to remove those unless empty cache
2.
is there a way to delete duplicate URLs in the global sitelists > identified / submittedd / verified
??
I thought this new function in the import target URLs is doing just that ?
3.
another problem somewhere mentioned much earlier is that in target url cache
> show urls > show left target urls
I find lots of URLs ending with a pipe "|" (of course only those who have PR = those from sitelist submitted or verified)
P.S.:
currently ONLY working with global site lists = ALL SE = OFF
I have currently some 342'000+ URLs identified = enough for weeks to submit and adding new ones daily
important
my overall performance since several days is excellent and the submit to verified ratio most of the time 50-85%
hence above problems with duplicates accumulating seems to have little or NO effect on SER verified submission performance