URL normalization
URL normalization is the process by which URLs are modified and standardized in a consistent manner. The goal of the normalization process is to transform a "raw" URL into a normalized URL so it is possible to determine if two syntactically different URLs are equivalent.
Our programs follow the rules below to normalize URLs according to rfc3986.
scheme - normalized to lowercase.
userinfo - non-ascii characters are converted to utf8 and percent-encoded.
host - normalized to lowercase, internationalized domain name converted to IDNA encoding.
port - removed if empty or scheme's default port.
path - non-ascii characters are converted to utf8 and percent-encoded, removed dot segments.
query - non-ascii characters are percent-encoded in the same charset as the page itself.
fragment - non-ascii characters are converted to utf8 and percent-encoded.
Leave a Reply
You must be logged in to post a comment.