URL normalization

URL normalization is the process by which URLs are modified and standardized in a consistent manner. The goal of the normalization process is to transform a “raw” URL into a normalized URL so it is possible to determine if two syntactically different URLs are equivalent.

Our programs follow the rules below to normalize URLs according to rfc3986.

URL normalization

scheme – normalized to lowercase.
userinfo – non-ascii characters are converted to utf8 and percent-encoded.
host – normalized to lowercase, internationalized domain name converted to IDNA encoding.
port – removed if empty or scheme’s default port.
path – non-ascii characters are converted to utf8 and percent-encoded, removed dot segments.
query – non-ascii characters are percent-encoded in the same charset as the page itself.
fragment – non-ascii characters are converted to utf8 and percent-encoded.

Categories: Bundle, HTML Checker, Link Checker, Manual, Site Inspector, Spell Checker

Leave a Reply