Skip to content

Do not unnecessarily encode colons in URIs/IRIs #14

@wouterbeek

Description

@wouterbeek

The URI library currently encodes colon in the path and in the query component.

Colons in query components

In Semantic Web services it is very common to include IRIs in the query component, e.g., to indicate a selection or query. uri_query_components/2 encodes colons in the query component, even though this is not necessary. In the following example, %3A should simply be :. The # is legitimately encoded as %23, because it would otherwise be confused with the fragment component separator.

uri_query_components(Query, [predicate('http://www.w3.org/1999/02/22-rdf-syntax-ns#type')]).
Query = 'predicate=http%3A//www.w3.org/1999/02/22-rdf-syntax-ns%23type'.

Colons in path components

Colons are not very common in IRIs, but some datasets (e.g., DBpedia) do use them. iri_normalized/2 unnecessarily encodes colons in paths, e.g., translating [1] to [2].

[1]   'http://dbpedia.org/resource/Category:Politics'
[2]   'http://dbpedia.org/resource/Category%3APolitics'

Reference

path = path-abempty    ; begins with "/" or is empty
     / path-absolute   ; begins with "/" but not "//"
     / path-noscheme   ; begins with a non-colon segment
     / path-rootless   ; begins with a segment
     / path-empty      ; zero characters
path-abempty  = *( "/" segment )
path-absolute = "/" [ segment-nz *( "/" segment ) ]
path-noscheme = segment-nz-nc *( "/" segment )
path-rootless = segment-nz *( "/" segment )
path-empty    = 0<pchar>
segment       = *pchar
segment-nz    = 1*pchar
segment-nz-nc = 1*( unreserved / pct-encoded / sub-delims / "@" )
              ; non-zero-length segment without any colon ":"
pchar         = unreserved / pct-encoded / sub-delims / ":" / "@"

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions