Skip to content

How to bulk import unique chunks of URLs into multiple projects (& awesome feature suggestion)

After I complete a scrape with Hrefer, I usually split the file into 20-50 chunks of 30k-300k unique URLs depending on how big the original scraped text file is. All files are unique URLs. It looks like this: image
After splitting the files, I will go into SER where I have 20-50 processing projects numbered such as this: image ...And I will import the corresponding split file into each project, eg Split 1 gets imported into Processing project 1, Split 2 gets imported into Processing project 2 and so on. I have to manually do this for each project and it takes forever. Does anyone have any suggestions on how to automate this a bit or a better way to do it? @sven?

If not, I have a feature suggestion. A lot of us process large lists like this in a similar fashion rather than just importing 1 huge file into 1 project. What if we could simply select our group of "Processing Projects", right click, select import target URLs from folder, and have SER automatically take 1 text file from that folder per project and import them each individually into each project.

Alternatively, we could select our group of "Processing Projects", right click, select import target URLs from folder, and have SER automatically take that 1 HUGE text file and have it automatically split it into proportionate chunks of text files depending on the number of processing projects you have selected and import them accordingly into each individual project. This would be AWESOME.

Thanks!

Comments

  • edited March 2014
    I think SER does this already. Highlight all projects you wish to do this too and import target urls normally and you will get an option to spread target urls to these projects.
  • Thanks @jpvr90. Don't know how I didn't realize this before. It's because I was only selecting 1 project at a time to import target urls from file and if you do that, you obviously never get that prompt. Awesome.
  • @Justin, don't you think that you may be skipping some submissions that way as some engines may be detected wrong? (like they should be Blogs, but get detected as article, etc.). Just wanted to hear your opinion on this matter.
  • What do you mean exactly @nikodim?
  • Why would they be detected wrong?
  • edited March 2014

    @Justin, I've noticed that sometimes SER detecs one engine as another if they are both ticked. I don't have samples on hand unfortunately.

    I wanted to know if it is viable to keep, for example, 20 instead of 10 Sorting projects but with different engines, i.e. 

    1 Project wit Articles, Wikis, Web 2.0

    2nd Project with all other engines

    3rd Project again with Articles, Wikis, Web 2.0 

    etc.

    This way for example Articles wont get confused as Blog comment.

  • I'm assuming the reason the can be confused as such is because sometimes the "page must have" variables can be the same across some sites, in rare circumstances resulting in miss-identification. That seems like a lot of work though honestly.
  • @Sven - if I import a text file containing 100k URLs by selecting 4 projects with control shift, right clicking, and selecting import target URLs by file, then selecting that text file of 100k targets, and then clicking yes when it asks if I want to split the target URLs between all projects, will it get all targets and split them evenly across all 4 projects or is there a chance it will miss some?
  • SvenSven www.GSA-Online.de
    It is not missing anything.
Sign In or Register to comment.