Skip to content

Import Target URLs & Show Remaining Target URLs

Not sure if this is a bug or if i need help..

So i scraped a list of URLs with Scrapebox using the GSA SER Footprints, de duped the URL/Domain and was left with a list of 540,000.

I attempted to import this list via GSA SER ( Right click > Import Target URLs > From File ) on a dummy project.

The pop up appears and says "Imported 540,000 URLs to the target project..."

When i click show urls ( Right Click > Show URLs > Show Remaining Target URLs }

It only shows 290 urls?

I then thought i would take a look at my URL list, even though GSA confirmed 540,000 was imported.. 

on line 290 there was a url like the following.. (replaced the actual domain with 'domain' the rest is exactly the same)

http://www.domain.com/index.php/اقسام-اخرى/متفرقات/10229-هل-حان-وقت-خروج-المهدي-و-عودة-المسيح-؟.html

So i removed this URL, cleared the URL cache so the list was empty, tried again and checked the show URL option and it now shows 65,000 URLs..

Went back to the URL list and at 65,000 was another URL like the one above, removed this, repeated the previous steps and it now shows 350,000 URLs..

Anyone know why this happens? 

Even if it only shows 290 URLs will it still post to my 500k ?
if not..
Would i need to manually check files to remove dodgey URLs like the one above in order to import the full list?

Thanks

Comments

  • UKCAS87UKCAS87 England
    edited January 2015
    Additional Info..

    It appears sometimes it is normal domains too i.e. http://www.domain.com/articles/cleaning-bathtub/

    So anyone know why the full lists dont show in 'show url'?

    Thanks

    EDIT:

    Finally, after removing the odd link here and there, i finally managed to get my dummy projects to show 540,000 urls after importing..

    Although, now when running the project, after 5-20 mins (on 5 dummy projects) the lists have either emptied & stopped or have say 50k links left.. 

    i'm only running at 50 threads

    Thanks
  • SvenSven www.GSA-Online.de
    can you send that file?
  • UKCAS87UKCAS87 England
    edited January 2015
    Hi, no problem, not sure how to send the file so uploaded it to mediafire & sent you a PM with link + password.

    It's probably something i'm doing wrong but for the life of me i cant work it out... now i am..

    Clearing URL cache
    Importing new list (says correct 540,000 imported)
    Click start, just says 'setting up project, starting project' and nothing happens..
    Click show remaining target URLs.. it now shows half missing..
    Click Stop project, URLs have gone

    I may try reinstalling it because prior to this it searched and posted fine since purchase.

    Thanks


    image

    image

    image

    image

    EDIT: reinstalled GSA SER and the project still stops at 'starting project' then after a couple of minutes says '20:36:54: [ ] Attention! No targets to post to (no search engines chosen, no site list enabled, no url extraction chosen, no scheduled posting)'

    Even if i right click the project and import list while it is active it does the same thing after a couple of minutes.. Tried several lists i havent previously tried.

    Thanks
  • Just like to add, i deleted everything in appdata/roaming and the lists seem to be working fine now..

    Not sure why it happened but it's sorted.

    Thanks
  • imageI am having the exact same problem. frustrated by it. ser will show over 300k url imported, but once run the project or a second, it will say cant find target urls,  when recheck remaining target urls will say a lower number like 17k or 200, or 4000 ( one screen shot i will include. empty spaces. some upper case some lower case.  tried two servers same thing. tried changing saving text file in different formats,  tried copy from clip board. tried putting in folder and pointing the failed to it and setting in the project to pull from the failed. nothing works.  a list from this provider has been working in past before the last two updates. also even tried to copy right from their website the url list.      what did you mean by deleted everything? you meant EVERY thing in the ap data roaming file?  or just everything in the list folders? https://www.dropbox.com/s/7xfl9y2n4cmmeia/messed up gsa import.png?dl=0
  • vicvic home
    from the 2nd screenshot, I can see the problem is something with file encoding when you export the list from SB
  • I know what it is. Its coding of your file, i guess ur using scrapebox v2 beta? or windows server 2012?

    In notepaad ++ change ur coding to utf-8 and save the file
    image
  • [ SOLVED ]  Using Notepad++ to convert to UTF-8

    This was a list from a provider. the other months worked fine till this list.
    I did try saving in utf 8 with a different program. but the notepad++ conversion worked.
    the list was downloaded and was in UCS-2 Little Endian encoding format.

    Funny supposed to be over 300k urls, but when open in notepad++ shows half of that. 157k

    SER purring like a kitten again,  Thanks for the help!!!
  • No problem! Glad to help you
Sign In or Register to comment.