Scraping Question
I have recently bought Scrapebox and have started building my first list. So far I have 137k URL's (De-Duped) and growing. These are all based on footprints etc for GSA.
Do I need to limit the size in any way or can I just keep building a massive list? Also, do you guys just build one big list or do you save separate files for all the different footprints that you scrape.
Thanks
Do I need to limit the size in any way or can I just keep building a massive list? Also, do you guys just build one big list or do you save separate files for all the different footprints that you scrape.
Thanks
Comments
1) for dedup, was wondering if we should delete duplicate domain? i have just been deleting duplicate urls
2) project settings > data > tools > import target urls... is there a limit to how many urls can be added?
i just imported a list of 120k+ (duplicate urls removed)... but was wondering if there is an import limit of some sort because scrapebox can come back with pretty big lists
thanks for the help!!!
1) This has no influence on speed at least. But for people who like to have everything sorted and organized to it's perfection, you should delete duplicate domains for all engines except blogs/image comments.
2) no, you can add as many as you want. Not all URLs are loaded but just 1MB of it and when done, the next 1MB.
been over 12 hours now and i can still see it's going through the urls
@Sven - if i closed the program...
a) does it lose the imported urls or
b) will it continue from where it left of (i.e. continue to process from the last url) or
c) start from the first imported url again
thanks
have everything sorted and organized to it's perfection, you should
delete duplicate domains for all engines except blogs/image comments."
so for posting, urls matter only for
- blog comments
- image comments
but i thought they would matter for trackbacks as well? please confirm
my lists are getting too large if i just dedup urls
just 4 platforms that needs to keep duplicate domains
- blogs comments
- image comments
- trackback
- pingback
thanks @Sven