Skip to content

Unrecognized - GSA Platform Identifier

NetMajomNetMajom https://buywebgraphic.com | skype: nszmmp
edited May 2015 in GSA Platform Identifier
Hi

I have bought the software. Its very good. (Now, i have all the gsa softwares arsenal :) ): If i run a project, is it meaningfull to save the unrecognized urls too? If yes, what can i do with it? Will the SER recognize it? Or leave it?

The Proxies menu are very different than other gsa softwares... Will it be the same in the future?

However i found a bug in the software, the progress bar are not changing, when a project finished, it show me that 100%, but when i run the software i see 0%

Comments

  • s4nt0ss4nt0s Houston, Texas
    edited May 2015
    1) You can save the unrecognized URLS and rerun them through to see if it can detect anything it missed on the first run. You can even setup a second monitor project to monitor the unrecognized URL's folder so it can be identifying those while the other project is running, etc. You can do it however you want, and you really don't have to save unrecognized URL's if you don't want to. 

    2) No the proxy scraper will remain the same as it is now due to the fact the other proxy scraper can't be easily converted over to Pi. It would take a lot of tweaking to get it to work.I personally don't use proxies when running Pi, but that's just me :)

    3) If you're running a "monitor" project, the progress bar never completes because its monitoring and doesn't know if you'll be dropping in more files later. It you're running a regular "process files" projects, the progress bar should complete when its finished. Do you know if you had it set as a monitor project or process files project?

  • NetMajomNetMajom https://buywebgraphic.com | skype: nszmmp
    edited May 2015
    s4nt0s.

    The PI knows all the platforms what ser knows, and vice versa?

    What is the best to use the output list with ser. Make sl file and import that to ser, or use the separated files?

    The monitor project is a new thing for me, i never used that before. Does it make any sense? 

     
  • s4nt0ss4nt0s Houston, Texas
    Yes, Pi uses the same engines as SER and you can always add engines into Pi's engine folder if you need to add more.

    Importing the list as a .SL is probably easiest or even using Pi's output as SER's global site lists - identified folder works too.

    When you click the "new project" button to create a new project, you'll see one of the first options is selecting the project type. You can choose process files, monitor files/folders or remove duplicates. Do you know if you selected process files or monitor folder for that project where the progress bar isn't completing all the way?

    I'm running it on my end and I see the progress bar completing fine (using process files).
  • NetMajomNetMajom https://buywebgraphic.com | skype: nszmmp

    I have found a bug. I use PI with gsa proxy scrapper. in PI i use 100 threads only (i copy and refresh the lists manually in every 30 - 60 minutes). Near the end of the project "freeze", and if i want to manually stop its say an error box "waiting remaining requests" there are some project what finished well also.

    I dont know what is the problem, but its frustrating, that i cant stop projects, and not finishing well.

    Because the error box, i only must force close the software, so i dont think its healthy... I have mainly all the GSA products, and i can stop projects whenever i want without any problem.

    Can you make some update?

    Thx
  • s4nt0ss4nt0s Houston, Texas
    edited May 2015
    @Netmajom - so far this seems to be a problem with running public proxies with Pi. It is not recommended to use public proxies with Pi due to them being unreliable. 
  • i agree there is really no need for proxies when identifying platforms. its just slowing you down.
  • NetMajomNetMajom https://buywebgraphic.com | skype: nszmmp
    edited May 2015

    I must use proxies, because yesterday i switch off the proxies, how its working the PI.... Now, my internet service provider called me, that a company in my country reported (prosecution) me because i attacked his site from my ip address, and if i dont want harder steps, stop that illegal activities from my computer.

    The PI hacking websites? Its very strange, i thought that only scanning, and identify sites. Isnt it? Than what could it be happenned?... 

    I hope that the company dont make any police charge to me....
  • magicallymagically http://i.imgur.com/Ban0Uo4.png
    LOL

    Relax

    Ask them to prove it, that such an attack indeed was attempted from your ip.

    And do remember, people are using wifi too these days around - That means it could be anyone (if you forgot to encrypt the wifi connection in your home for instance)..

    .In other words, someone might have spoofed your wifi network:D


  • s4nt0ss4nt0s Houston, Texas
    @NetMajom - I can tell you I've been running Pi without proxies since the beginning and have never had any problems. I'm pretty sure the majority of Pi users don't use proxies. Of course there is no "hacking" - its downloading the source and matching it to an engine, that's it.

    If you're going to use proxies, you should probably use shared/private proxies due to speed and reliability. 
  • NetMajomNetMajom https://buywebgraphic.com | skype: nszmmp
    @s4nt0s - Yes, i know that too, but the internet provider tell me the right time, and i used only the PI without proxies, in that time. If the PI dont make any attack to that site, than what could it be happened? Because i if a company report or prosectuted a person about the attacking, its not a joke...

    Sorry for the question, just i must know, to avoid in the future, because that phonecall (when phoned me about the attack) was really hot....
  • s4nt0ss4nt0s Houston, Texas
    @s4nt0s - I really can't tell you why it happened, but you can get reports like that running any tool like Scrapebox, Gscraper, SER, etc. Some ISP's are strict about that I guess, but most people run these tools on a VPS/dedi server and not on their home PC. 

    Maybe they don't like the bandwidth being used so they make up something to get you to slow down/stop. I'm really not sure ... 
  • Trevor_BanduraTrevor_Bandura 267,647 NEW GSA SER Verified List
    This happened to me also. I use a dedi server from @solidseovps and there was one site who contacted them because I accessed a 404 page on their website a couple times.

    PI is only identifying sites so, maybe the owner of the site had nothing better to do but contact all service providers from all IPs that access a page not found on their server thinking i'm trying to hack it or something.

    I started to use proxies since then, but yes proxies slow PI way down. I hate using them but what can I do.
  • s4nt0ss4nt0s Houston, Texas
    @Trevor_Bandura - Thanks for letting us know. Ya, unfortunately there is nothing can be done if it accessed a 404 page since its only accessing the URL's that are fed to it. If we added an "alive check" it would slow things down A LOT. It's not crawling sites and finding more pages or anything like that. 

    If you do have to use proxies, shared/private is recommended. 
  • Trevor_BanduraTrevor_Bandura 267,647 NEW GSA SER Verified List
    edited May 2015
    @s4nt0s I'm using dedicated proxies. Wish I did not have to but @solidseovps said to use them. I know the report I got was really nothing to worry about but better to be safe and not get my servers I have with them shut down.

    I just don't understand why someone would waist their time contacting a service provider because a 404 page was accessed. Makes no sense to me.

    But either way, I love PI and it has helped me tremendously building bigger and better quality lists in SER importing already identified sites to process.
  • @Trevor_Bandura

    Recently there is alot of websites that doing this for business, like would list your ip and ask you to pay for delisting, this was one of them. We still have to forward any claim to our client and have it fixed, as 99% of any claims come to us through our datacenters in which we are required to have this solved and not repeated.
  • @NetMajom, I suggest you remove duplicate URLs and sort them in a randomized order. This way you don't hammer their sites, scanning hundreds or thousands of pages simultaneously.

    For contextual sites, remove duplicate domains. This way you only scan one page per domain.

    Hope this helps...

  • gsa8mycowsgsa8mycows forum.gsa-online.de/profile/11343/gsa8mycows
    @solidseovps

    Is there any way to set the DNS settings so that a rdns lookup would not reveal to whom an IP belongs to? Like you could set up a fake hosting company and have these narks send you their automated abuse messages.

    Due to my misconfiguration and half-assing SER, my vps providers received abuse complaints. They suspended me once and warned me about twice. Why do service providers take these automated complaints seriously?

    "Recently there is alot of websites that doing this for business, like
    would list your ip and ask you to pay for delisting, this was one of
    them. We still have to forward any claim to our client and have it
    fixed, as 99% of any claims come to us through our datacenters in which
    we are required to have this solved and not repeated."
  • Good day tell which way you should write in settings; destination folder
    And when you create a project in single file?
    When I create a project and I need to use the filter by keywords tool.
    In single file folder is not saved filtered links .
    How to solve the problem? That's settings screens where the error ?
    http://joxi.ru/823xvMaI6GZREA
    http://joxi.ru/823xvMaI6GZvEA
    http://joxi.ru/DmBLZ5QhN10bLA
    As seen in the screenshot links recognized in the project and there is no folder.
  • s4nt0ss4nt0s Houston, Texas
    @alexey- it should be saved, but if you choose the option to save to a single file, then everything filtered will be added to be saved to one file only. Otherwise, choose the per file option and it will save them separate. 
  • That's exactly what seems to fold after the update is no longer in the file. Now only works if I put a check box in the extended matching. Only in this case, it takes off links

    I'm sorry for what I write through the Google translator . ENGLISH Do not know = )
  • s4nt0ss4nt0s Houston, Texas
    edited April 2016
    @alexey - Ok, I just tested on my end using your settings and I see the problem now. We will fix it and push an update, thanks for reporting. 

    @alexey - Please update to latest version, it should be fixed. 
  • alexeyalexey Moscow
    Thank you for your excellent work !
Sign In or Register to comment.