How is site type determined?

I loaded up over half a million URLs into the site lists yesterday, around 135,000 were of known type - but I had a bad connection yesterday - seems like a lot of determined sites for a bad home broadband connection to work out.

Does it connect to the sites and get a meta header etc to determine site type?


    It downloads the page and looks for certain footprints in the html code. These footprints are defined in each engine's .ini file.
