Log from the custom proxy with output of request object and selection of fields
2024-08-02 15:51:26,940 - pid:209673 [I] plugins.load:89 - Loaded plugin proxy.http.proxy.HttpProxyPlugin
2024-08-02 15:51:26,940 - pid:209673 [I] plugins.load:89 - Loaded plugin __main__.RequestPlugin
Request _url: www.w3id.org:443
Request.method: b'CONNECT'
Request protocol: None
Request host: b'www.w3id.org'
Request path: None
Request properties: {'state': 6, 'type': 1, 'protocol': None, 'host': b'www.w3id.org', 'port': 443, 'path': None, 'method': b'CONNECT', 'code': None, 'reason': None, 'version': b'HTTP/1.1', 'total_size': 116, 'buffer': None, 'headers': {b'host': (b'Host', b'www.w3id.org:443'), b'user-agent': (b'User-Agent', b'curl/7.81.0'), b'proxy-connection': (b'Proxy-Connection', b'Keep-Alive')}, 'body': None, 'chunk': None, '_url': , '_is_chunked_encoded': False, '_content_expected': False, '_is_https_tunnel': True}
Request _url: /simulation/ontology/
Request.method: b'GET'
Request protocol: None
Request host: None
Request path: b'/simulation/ontology/'
Request properties: {'state': 6, 'type': 1, 'protocol': None, 'host': None, 'port': 80, 'path': b'/simulation/ontology/', 'method': b'GET', 'code': None, 'reason': None, 'version': b'HTTP/1.1', 'total_size': 96, 'buffer': None, 'headers': {b'host': (b'Host', b'www.w3id.org'), b'user-agent': (b'User-Agent', b'curl/7.81.0'), b'accept': (b'Accept', b'/')}, 'body': None, 'chunk': None, '_url': <proxy.http.url.Url object at 0x7f50b472bc10>, '_is_chunked_encoded': False, '_content_expected': False, '_is_https_tunnel': False}
2024-08-02 15:51:35,421 - pid:209683 [I] server.access_log:388 - 127.0.0.1:59292 - CONNECT www.w3id.org:443 - 683 bytes - 680.55ms
Request _url: http://www.w3id.org/simulation/ontology/
Request.method: b'GET'
Request protocol: None
Request host: b'www.w3id.org'
Request path: b'/simulation/ontology/'
Request properties: {'state': 6, 'type': 1, 'protocol': None, 'host': b'www.w3id.org', 'port': 80, 'path': b'/simulation/ontology/', 'method': b'GET', 'code': None, 'reason': None, 'version': b'HTTP/1.1', 'total_size': 145, 'buffer': None, 'headers': {b'host': (b'Host', b'www.w3id.org'), b'user-agent': (b'User-Agent', b'curl/7.81.0'), b'accept': (b'Accept', b'/'), b'proxy-connection': (b'Proxy-Connection', b'Keep-Alive')}, 'body': None, 'chunk': None, '_url': <proxy.http.url.Url object at 0x7f50b4520610>, '_is_chunked_encoded': False, '_content_expected': False, '_is_https_tunnel': False}
2024-08-02 15:51:44,094 - pid:209688 [I] server.access_log:388 - 127.0.0.1:49428 - GET www.w3id.org:80/simulation/ontology/ - 301 Moved Permanently - 573 bytes - 479.09ms
Describe the bug
the field values of the request object (for a GET request via http and https) vary between http and https (intercepted) in an inconsistent and potentially incorrect manner that is hindering correct implementation of a custom archive redirection plugin
protocolis None always -> is this supposed to carry scheme information? how to distinguish whether the request is https or not???_urlis incomplete (missing scheme+host) for https requestshostis None for https requestportis incorrectly reported as 80 for https request (that is actually sent to 443)To Reproduce
the issue can be reproduced with
poetry installand thenpoetry shelland thenpython proxy/request_proxy.pyin the root dir of https://github.com/kuefmz/https-interception-proxypy/Expected behavior
protocolshould not be None but http or https?_urlshould be full request url (including scheme+host) for https requests that are interceptedhostshould be equivalent to the FQDN for https request as they are for http requestportshould be correctly reportedVersion information
Additional context
Log from the custom proxy with output of request object and selection of fields
2024-08-02 15:51:26,940 - pid:209673 [I] plugins.load:89 - Loaded plugin proxy.http.proxy.HttpProxyPlugin 2024-08-02 15:51:26,940 - pid:209673 [I] plugins.load:89 - Loaded plugin __main__.RequestPlugin Request _url: www.w3id.org:443 Request.method: b'CONNECT' Request protocol: None Request host: b'www.w3id.org' Request path: None Request properties: {'state': 6, 'type': 1, 'protocol': None, 'host': b'www.w3id.org', 'port': 443, 'path': None, 'method': b'CONNECT', 'code': None, 'reason': None, 'version': b'HTTP/1.1', 'total_size': 116, 'buffer': None, 'headers': {b'host': (b'Host', b'www.w3id.org:443'), b'user-agent': (b'User-Agent', b'curl/7.81.0'), b'proxy-connection': (b'Proxy-Connection', b'Keep-Alive')}, 'body': None, 'chunk': None, '_url': , '_is_chunked_encoded': False, '_content_expected': False, '_is_https_tunnel': True}Request _url: /simulation/ontology/
Request.method: b'GET'
Request protocol: None
Request host: None
Request path: b'/simulation/ontology/'
Request properties: {'state': 6, 'type': 1, 'protocol': None, 'host': None, 'port': 80, 'path': b'/simulation/ontology/', 'method': b'GET', 'code': None, 'reason': None, 'version': b'HTTP/1.1', 'total_size': 96, 'buffer': None, 'headers': {b'host': (b'Host', b'www.w3id.org'), b'user-agent': (b'User-Agent', b'curl/7.81.0'), b'accept': (b'Accept', b'/')}, 'body': None, 'chunk': None, '_url': <proxy.http.url.Url object at 0x7f50b472bc10>, '_is_chunked_encoded': False, '_content_expected': False, '_is_https_tunnel': False}
2024-08-02 15:51:35,421 - pid:209683 [I] server.access_log:388 - 127.0.0.1:59292 - CONNECT www.w3id.org:443 - 683 bytes - 680.55ms
Request _url: http://www.w3id.org/simulation/ontology/
Request.method: b'GET'
Request protocol: None
Request host: b'www.w3id.org'
Request path: b'/simulation/ontology/'
Request properties: {'state': 6, 'type': 1, 'protocol': None, 'host': b'www.w3id.org', 'port': 80, 'path': b'/simulation/ontology/', 'method': b'GET', 'code': None, 'reason': None, 'version': b'HTTP/1.1', 'total_size': 145, 'buffer': None, 'headers': {b'host': (b'Host', b'www.w3id.org'), b'user-agent': (b'User-Agent', b'curl/7.81.0'), b'accept': (b'Accept', b'/'), b'proxy-connection': (b'Proxy-Connection', b'Keep-Alive')}, 'body': None, 'chunk': None, '_url': <proxy.http.url.Url object at 0x7f50b4520610>, '_is_chunked_encoded': False, '_content_expected': False, '_is_https_tunnel': False}
2024-08-02 15:51:44,094 - pid:209688 [I] server.access_log:388 - 127.0.0.1:49428 - GET www.w3id.org:80/simulation/ontology/ - 301 Moved Permanently - 573 bytes - 479.09ms