“CRAWL.PL” Measuring Statistical and Structural Properties of the Polish Web. Technical Report
Abstract
This document summarizes the results of an experiment made in the Polish-Japanese Institute of Information Technology, Warsaw, Poland during autumn 2005 and winter 2006. The goal of the project was to collect and analyze large portion of Polish Web documents in order to characterize the structure and other properties of the „.pl” domain. Up to the knowledge of the authors, it was the first publicly reported research experiment of this kind over the Polish Web. The following sections include information about downloaded Web pages, Web sites, and their characteristics. We also present various statistics concerning hosts and domains, as well as the link structure. Among the results of the experiment are the first data sets representing graphs of the Polish Web which will be publicly available for other researchers.