Skip to content

Big scrape domain (by extension) reward

2456

Comments

  • Kaine

    I want to participate. Is the dico.uk.txt still available?

    Thanks
  • Currently at just over 5%, running at about 40k/minute
  • KaineKaine thebestindexer.com
    edited March 2014
    @DoncorleoneMe it does not bother me, just trying to hang too did not leave so that everyone gets are gift :) (I have deleted on you wall and send be Pm).

    I think mine are going to be available today (will be unlocked for those who have finished.)
    I just create an account on Mega, everything will be listed. Only the worker will have their pass.

    @Thunderman: I'll send it by PM :)

    @jjumpm2: Ok this is noted ;)

    @vifa: i don't know if you have take your list (no trace in my inbox) // EDIT solved.

    This allows us to know what level we sum and estimate how long were going to inject it all in our SER project
  • KaineKaine thebestindexer.com
    .Uk is now taken by @Thunderman

    New update (More than 8 seats) :

    image
  • I would also like to participate.
  • KaineKaine thebestindexer.com
    edited March 2014
    Sure, you have a preference ? :)

    Edit:

    .Eu sended to @coneh34d by PM (No difference will we all, I took in order).
  • KaineKaine thebestindexer.com
    New update (More than 7 seats) :

    image
  • KaineKaine thebestindexer.com
    edited March 2014
    Ok, i PM you .co :)

    .Co is now taken be @accelerator_dd
  • Can I join in? I'll take .ac.uk if that's OK?
  • i ll take one of your choice pm me with instructions :)
  • edited March 2014
    Can you please PM what is required to do this.
  • KaineKaine thebestindexer.com
    Hi all :)

    @JudderMan .ac.uk is not taken it's ok for work on :)

    @antonyalston and @tommy99: i sent you Pm with list for work too.
  • KaineKaine thebestindexer.com
    edited March 2014
    Ok it's time for update :)

    Now domain.AC.UK is taken be @JudderMan

    @antonyalston have domain.INFO

    @tommy99holds domain.IT


    Now they only have 3 seats ... move

    .BE
    .ES
    .BIZ
    image
  • KaineKaine thebestindexer.com
    edited March 2014
    Domain.FR list Job Done.

    Contains 1 253 331 unique domain.

    File uploaded on Mega.
  • KaineKaine thebestindexer.com
    edited March 2014
    Deleted ;)



  • in for .es let me know if i am selected.
  • KaineKaine thebestindexer.com
    edited March 2014
    Yes you are :) i PM you

    Update:

    Now they only have 2 seats.

    .BE
    .BIZ
    image


    I'm working on a huge list (maybe 1 ou 2 giga) to be divided on those who wish to participate. 
    This list is only for  domain.COM at this time or we will draw a maximum of Google together. 
    Once your current list made ​​me know if you are interested for a section.



  • KaineKaine thebestindexer.com
    edited March 2014
    I just finished this new list which is relatively consistent. 
    It is cut into 1 million rows. 

    Out of curiosity I did a little scrape on the first list, result: 

    In 10 minutes I have 1,018,373 results for 207 867 unique domain (trim to root) ;)
  • I am currently at 26,678,815 just below 25% completed.. I am starting to wish I had split the file!
  • KaineKaine thebestindexer.com
    You have not selected Remove Duplicate url at Scraping ?

    All about 1 GB I trim and dedupe to avoid that :)
  • :) for some reason I didn't think it would be that many.. Oh well, it will be all good in the end..
  • KaineKaine thebestindexer.com
    ^^

    Even using this option I would say that generates about 5 giga url :)
  • edited March 2014
    I'll take .biz and .be
  • edited March 2014
    I take whatever is free. Have resources free on a new "only for Scraping-VPS".
    Please let me know details.
  • KaineKaine thebestindexer.com
    edited March 2014
    Ok I agree with the last two list at @Yashar and @TOPtActics.

    I prefer to have more dispersed resources Yashar.


    jjumpm2  is responsible for domain.CO.UK

    fakenickahl  is responsible for domain.COM

    DonCorleone is responsible for domain.DE

    Justin is responsible for domain.NET

    ewandy is responsible for domain.ORG

    gooner is responsible for domain.EDU

    Trevor_Bandura is responsible for domain.CA

    vort3x is responsible for domain.CH

    vifa is responsible for domain.GOV

    @kaine is responsible for domain.FR

    @Thunderman is responsible of domain.UK

    coneh34d  is responsible of domain.EU

    accelerator_dd is responsible of domain.CO

    JudderMan is responsible of domain.AC.UK

    antonyalston is responsible of domain.INFO

    tommy99 is responsible of domain.IT

    prab1996 is responsible for domain.ES

    Yashar is responsible for domain.BE

    TOPtActics is responsible for domain.BIZ



    We are full, there more space available not yet.

    image

  • Seems interesting and clever plan from your side, but what is the use of this since GSA can only post to, let's say 5 %, of the web sites you get ? You will receive tons of cms which have no commenting enabled at all, tons of static html sites, tons of moderate posting sites, tons of various platforms and dashboard which have no use for backlinks at all, etc. I mean if you are willing to process few hundreds of millions links through GSA you may get 1 million unique URLs where GSA can post to, maybe, but it will take a lot of work and resources to get it done.


  • KaineKaine thebestindexer.com
    You are right, however SER does not linger if he does not know or can not post on the site. 

    "No engine found". 

    It'll also help to scrape in a different way and probably to highlight sites that we can not currently find. 

    Everyone wants to scrape site as Ahrefs, moz or other database and we bypass the problem and will do what nobody else has done to my knowledge. In addition we combine our resources.
  • Interesting,

    If you have it available send me over one of the list and i will scrape it with gscrapper i would also like to see the results in GSA with scrapped list.

    Thanks.
  • KaineKaine thebestindexer.com
    edited March 2014
    It's ok you are the first on the waiting list ;)



    Ok, the new list to put on the grill. 

    This is only available for those who have finished their job. Thank you not to rush the job. I demand serious in this work together as we receive the work of others. In this connection made ​​careful to select the language for your list.

    The original list is 2.24g X 2 and processes .Com 

    Let's list 1 and 2. 

    List 1 is therefore 2.24g cut by one million lines. 

    This gives us 13 189 files> 190MB! 

    Why is the file so big? 

    Because request sent to Google is made to return to the maximum cuted TLD domain. Thus we have more income and less duplication.


    image

    At this time i have already finished domain.FR (to unlock for those who finish their job).

    I start now first file "GScraper-Split-1.txt"



    I sent a PM to all workers with my skype address for this project. 
    I'm looking for the best way to centralize our job. 

    And each member has done its part, can only access a download all lists. 

    I speak with justin and the best way seems to be access via IP. 

    vort3x 'll probably set up a chat or we can all share our point of views simultaneously. 

    The best is yet to come.
Sign In or Register to comment.