I am uploading JSON data content to an elasticsearch index using python http.client. I successfully achieve to put the data but I'm having a char issue. Once inserted, special chars like é are outputed like é.
Here is the code :
import http.client
connection = http.client.HTTPConnection(elastic_address)
headers = {"Content-type": "application/json", "Accept": "text/plain"}
connection.request('PUT', url=endpoint, headers = headers, body=json_data.encode('utf-8'))
I have noticed that if I change the special chars in the source JSON before sending it like é replaced by \u00E9, it's working fine. It may be because Elasticsearch uses another char encoding but according to this link, ES uses utf-8 as character coding.
I've also overviewed the client.py of the http.client package and it seems that the data are encoded in latin-1, see below :
def _encode(data, name='data'):
"""Call data.encode("latin-1") but show a better error message."""
try:
return data.encode("latin-1")
except UnicodeEncodeError as err:
raise UnicodeEncodeError(
err.encoding,
err.object,
err.start,
err.end,
"%s (%.20r) is not valid Latin-1. Use %s.encode('utf-8') "
"if you want to send it encoded in UTF-8." %
(name.title(), data[err.start:err.end], name)) from None
I'm not sure where the issue is, in the script? in the http.client package? in the Elasticsearch index settings?
Any idea?
Related
I have to send a POST request to the /batch endpoint of : 'https://www.google-analytics.com'.
As mentioned in the Documentation I have to send the request to /batch endpoint and specify each payload on its own line.
I was able to achieve this using POSTMAN as follows:
My query is to make a POST request using Python's requests library
I tried something like this :
import requests
text = '''v=1&cid=43223523&tid=UA-XXXXXX-1&t=event&ec=aggregated_stats&ea=daily_kpi&el=bookmarks&ev=13
v=1&cid=43223523&tid=UA-XXXXXX-1&t=event&ec=aggregated_stats&ea=daily_kpi&el=upvotes&ev=65
v=1&cid=43223523&tid=UA-XXXXXX-1&t=event&ec=aggregated_stats&ea=daily_kpi&el=questions&ev=15
v=1&cid=43223523&tid=UA-XXXXXX-1&t=event&ec=aggregated_stats&ea=daily_kpi&el=postviews&ev=95'''
response = requests.post('https://www.google-analytics.com/batch', data=text)
but it doesn't works.
UPDATE
I Tried this and it works !
import http.client
conn = http.client.HTTPSConnection("www.google-analytics.com")
payload = "v=1&cid=43223523&tid=UA-200248207-1&t=event&ec=aggregated_stats&ea=daily_kpi&el=bookmarks&ev=13\r\nv=1&cid=43223523&tid=UA-200248207-1&t=event&ec=aggregated_stats&ea=daily_kpi&el=upvotes&ev=63\r\nv=1&cid=43223523&tid=UA-200248207-1&t=event&ec=aggregated_stats&ea=daily_kpi&el=questions&ev=11\r\nv=1&cid=43223523&tid=UA-200248207-1&t=event&ec=aggregated_stats&ea=daily_kpi&el=postviews&ev=23"
headers = {
'Content-Type': 'text/plain'
}
conn.request("POST", "/batch", payload, headers)
res = conn.getresponse()
But the question remains open, what's the issue with requests here.
You don't need to double-escape the newline symbol.
Moreover, you don't need the newline symbol at all for the multi-line string.
And also the indentations you put in your multi-line string are counted:
test = '''abc
def
ghi'''
print(test)
Here's an SO answer that explains this with some additional ways to make long stings: https://stackoverflow.com/a/10660443/4570170
Now the request body.
The documentation says
payload_data – The BODY of the post request. The body must include exactly 1 URI encoded payload and must be no longer than 8192 bytes.
So try uri-encoding your payload:
text = '''v=1&cid=43223523&tid=UA-XXXXXX-1&t=event&ec=aggregated_stats&ea=daily_kpi&el=bookmarks&ev=13
v=1&cid=43223523&tid=UA-XXXXXX-1&t=event&ec=aggregated_stats&ea=daily_kpi&el=upvotes&ev=65
v=1&cid=43223523&tid=UA-XXXXXX-1&t=event&ec=aggregated_stats&ea=daily_kpi&el=questions&ev=15
v=1&cid=43223523&tid=UA-XXXXXX-1&t=event&ec=aggregated_stats&ea=daily_kpi&el=postviews&ev=95'''
text_final = requests.utils.quote(text)
response = requests.post('https://www.google-analytics.com/batch', data=text_final)
Finally , I figured out the solution myself.
Updating for others help.
The problem was I was working on AWS Cloud9 and as mentioned in the documentation
Some environments are not able to send hits to Google Analytics directly. Examples of this are older mobile phones that can't run JavaScript or corporate intranets behind a firewall.
So we just need to include the User Agent parameter
ua=Opera/9.80
in each of our payloads
It works !
I'm catching HTTP requests made by Selenium using Browser Mob Proxy (using the browsermob-proxy Python package). In my HAR file, I see this (it should be a Javascript file):
"content": {
"comment": "",
"size": 10908,
"mimeType": "application/x-javascript; charset=utf-8",
"encoding": "base64",
"text": "4qF2FJIUvnlRlDNSpoB6KeBkiFADdf4D8QYKthNRVc+tMqHv1vuipJbkMKsgPuuM4/FG3RVcLlYhxHmH35SDP/NTPTs+PWuP0OSUmpep2uur7Gg0QAyJLhUO46Fajm7qKW2HIkAKFgUwAKh2pplqVvWqnKqQtoMInI5jpiEhh7QO8b+zQ+HSllatf4Oq8hdgAzWVjJFBy4K7J02g0CWJaE3ikUbWh96mImD0tVD/hsgTm03zkZSKPTbyuux6ZjVB07R8fS9NIHHcSSsOkdejy1htoCuWa6qNMM7z2VERAsm2YnW3oRpFsJkUvrT3/99X7Z1ffUj6En9wyp2bLmQNl4tmOqemFs7Z517zBcDzAJBLIEGuERUmkNLE8M4NjyCCDEDgMIy0VvwpaBxCrmLponHp1adYxq6kOjeldTMD6cmsD03id+swTblb2+2X1oTXgDEGDCHbH9Pv+feptbckYYUdiFQCJAkgFKl3f8tnz3Np6clo7Ew75vzG+LaQOB1e21ynJe35eaQA+I2sEYlCgm6h+3cbWLVx4P2jO4754x/xT0lPemYveuX+jAjSCQeYXsCHqHK+xS9AIgSCvgDnhNzk3IcLxJEmwqIAJz+ccdWDC0AiQxA8QvMlzrniwQWgl6P//xfAEdO2ruZucLVYmUp6L8E5KE8gPqpjGVdkWCnJMJDqyq3bEkxdVBdt/dWpFB/n7JTjKAWquRsjYY49sj6MU0mu8+tOomC5okbz1zk+mvUs80FeweRRKRYW4FyjaRrQBbAOCQRMnIgzT/64QJrkePBXCl7JmhzbXBUevKs1h14bjs1sUmtKqH+rLB0fQUqwhDTrVBagWlKH5wlnIY2dCSIkOfKSIVexLI+XuLxVOn/IghTLTaLnbwpql4VSlGrJbyLmJMdECxS2wSLYQVOELiAoMBGGbBd9sD0GepfYYnCfAeElLGvqTq0ShuoNNuhNfKW/rO/0l/Wl/rC+xRoqZxy6pA2MmmIat8dp0tUbzfRBSHijlOJhbcHmNh9+x7v4mozNGIBkYoahaA2zpiBeNhj0Ej0RA6wKLIDyB4EPPygG2w9rPs1RumMd5jI8FKD9dFvMjNo0aIbaV0L8GeSZbYL3U6tOHJDs6m8CTv3fJO83U2oqb0Z/OqF3s6YaofjpKWJodgmTDYQoe34GirhT48/d6KtVqYeopRxkma1jkwh3VvyLbZKt7EiC218Pgzhd8S9dRwZyWuVy7DZNVTfnhuAdstuEMJE3buPKTWXxh2gHe3vMOMAu/UgDJEI2dWlPZY9lMxAbZhX6mmZS4nBJ6pmMsC399L5U8P5wXF52C/gLHJyh0gG/tEYaCnTYOkGBMxeKvdVCCzhjF9F2N0FEqpda6AbF5dqEcqA9LeVWfVB593KW5d0UNzLZfKA7KkyIaxv0frIe3pArWTII45gE6bKRfKOds+qY2Rw7d25jzdM+izE3iwAV/yVQjfsqQyzYUD9og43+lKQ31pvqh2OPUusJtMwgwRwzgsrNJV6+EfV6wxTI5gpTLWtDOBlwqzc9mDSPP8s8V9Bk9K9AZEDwc4liGpTxJORUggJzI8wi63Ef9VnABBvbgqTMVRCaits9Ct36UWYhFA+E9JUzaxjRYdE5lB+fFUHiMGpmLtBIQ7tMvtgYZ4wOTYE6mrr1+HEAouFCGmM+U+YiYDMYFfyAPIxNYPLnAYMo6ris9QAh/MsP5ZHJFuCOaq1CCFVqX/8omOL/63CN95GEbmXfoi3yOx75uOmMBbrSIFimVA93Eu0ybKo4B2l1+W5LnpfwQizbfODnn9wJqKHB+N5bOXAGzdBZySrgfmmAToVL1Ia3phOZmSD+pP0TT7HF9vsBiwI6iAFgWFvCCBEz/kApOkCJOdrn9ddVSBTu54yYizFgKXyX1qfvgvEn9vd2O/dQ85tAZwdLyvRwYFKNYFvUUx4XcRXt0GF/iscrGnN2lNeMHZm5UYa8lz55jWX2bwKLeRl4GmDa20GCk0USvplft02D68R422UEzqQSzNPyB0GtKGa2ZR9/OfSG1V39yuRrs5xAC00z6N39/A7IEoAIJKtzM2cmKGwNFEjArbJxhZmdT/+8PD3d/8JNVVuuEQCt+dDQMQx2FBnw8zxHvKW822DM8mAS5686v5o9TqXd92/4VPmJzhg4lZCLFswnkLBMzBHm9LxosFcZ+APh2LdZWAJ2ppaUN2t1Jh3SbSmCl+N5sskINxsLepYv+f6uDVUPgPaL99WtdmH4oR08VL4fbth53mUoc4/KdzjMRUfTZJSowxll0JOkO+Dc+qZtwstk8ctkObmNPWQpnJ4QCn1USRhHKI2kWMyNEur9puecpciC8oBl4inEpZf2dMfJwUkIs9RfGLXXPQzdd0iBdr3txSXhLvCHNQmmY6MSNyZJ/VyMkBpNgmra2XPFCydXP705ekqDEtUDEmirRFjydKAabA/xAtWPRSV+CqNMhh9JlCWN9yDMgsIGEeZoUySQrWfDbl58t0TbLZGoVe6/uCSv1Lndd1cJVUxXZXPA9w5ygglMnFt7XjidPxOkxBnsYzV0iMN4mxImbYy5Bp9rAUG5tKLty1c9r1aez3lWU+aQy55P+bPkyfawyx6tP0imJ84v+fnxaZy18/YgpjL4t1UAYIvEEiDkGsHIWCaW2XOJRufXTqqkUJmSgl7eCQt/dDn8dtdEaqlHqVPZAa7ggcsMsB5LmEPXp2ZqTz60HOzikHF0ShLoV5+Kz96bvEZclCTy+TMw6Q9PzqiFHgd/MHXHeDrUomtAMwc4J7uOHg4nwH/q3PAEaJPHZOHjL2D8OuRNvXTAxfQZp39ezxP8NfzKkdXCkAwrh+jnzNRtF/uFX67Oloz6Dve6ygxHsCTozqZM+YRmEXNerv6AkP7JX8zBDET+21359HP9+yIuhuoQ+wr7Zhj7RV2clQ5Kn3Zfkd864Kc66/PiUK5KfbOFqJmDgeKYdnGbOU7BjZgepm7fouQvbZ7KBXN/eXt1yorXHflfXsriP1mU0rrOXCs2HakKCRhiu3b/HDRsStjQ26Wy6FEV7d7kqbYVxs66jBHlp7DGfpiChZgJotfZ40nzGUHurzkDovnN1bKv7no//OvPf/+G3dXd9XTv5dv+Zvt63d2vx3K8j3tf12uZRzj6duh/6cv+3c8vP6/b/c234/XLza3+se6tsvW5jM6+2bXW8/nXN+BvcvcH8DaMkaoQpW6OPFBwyCFo0JjMyidiKJb6of+K4PNcyZrDdNLifm82ZykFi6td9itE3YLj1DZ/Q0C6ya4aKKfbP799TndvoBrhcp2JOz3HozuP4k3G1643eVR3vz19AfoQ5/mEPIXOLkxJH/DujJn1AmV0v7K/d7mpso+q00ortiQzb7bHjViDHnfV6uXRHcCSA9yVf40X0UFGxEKHQ40dhc6yKXW3RlpaAQyRs4nCeCUIrH49IFWahFKASuTvKfPAJpfM1I2uucwYD2ONHgYQbv8YeFwdQS4HPoeYTuTAiIF6QOqGmwlcaN31n2CfU54BZ89vTs9OgYIJ46kNLhQ4mLh68uZjd7n7DAzIggN59bpzqJ3gCk0AjVOzAamu0fRHZSu1R9MwXdfd5lESwTHrm22W9YY3q6CFhuBt1Nser1+9T5WlueKoXyUJUk87x8r2WJ62ziNvWFaDiRDYCm5VUOPqodwcpVWdlpToxyuQdio/0MgNbpv6oXMgjmAzWERu03C7Mlq4lCG4eRktXaK3hXAqdauh+f+4d2IKcKumRjBldAviNKpToVYjNwMg4QyFlI8Jk0CCnAvK6zSM3w998HDMQL0fO7g8ZromatNA2ln1RvMouneLySCJyeKepSokdJjlAm1Eq3+jd+8G7/Qu1LZpFp5ZUsopHynPYoaTFDlB1IeG8K8a2r9qTKBO4pQgL9MYQKOr2Wx8bRmcWxJYFbSJe1ne8lao1QPkN7RG4F/9yPzRhcIYHGvEo401qcUdZ+ZeLLx5DkIb/rWnNLLYKUKJR5p5WIjGnPmQ4UWvI0A3Rh0PhiPxQweDgRhDYGkLQRDiUeF+Ivc7ohJnFPbI6pwS91ShkndwYFVCAWCvqOSJ2vMK8SNSmG8zo0TUMFfWowKrhw8Fqu4N6O8Wi/fKIF05x60uVBbocUxnEznftHrr7XrTCzE/6WiKWKB0DCWq/PGSQPijJEBrWgeIJnOoc6PVuHTOTOfJnLkxZz6MJgjmSWeIc08fNneB2wDxCubVxfMMyOOu4WargYXqx1ibPkZVih9cLdkPLkoh5dZfXKBXTm7MBSaMwRfQN2R9BLw7CdbWcKfc1j5+eGtc/2LN7V+so/gXj+twGPAYmHdlUgVhjaiWGPdU9ON3SkVT0MsLD/3CH9Cbp6rmznYyA6umTokDBMQfPfsgrS6cb/TrxA+/bhrUryMwhOoqgUhriFXppQH82ugySbPEiLi1GX77rSdaYAIDtNe0oUiCICN0D9dVh4EgNDoo+LHZ2NKPxUaKfuhQV4QXBDqg9gc/qE99mE4hOgS6kzfP78Phdh2VmgTZSGfXcehsuj20Ynvop/PQYM5DQ5tDQ+Fwqt5LqeNYmZWJmJWqtWuprrTWioJrmd1fnwuKOqwYXgx/1T/j91XXh6t4K6/e3LwQUdY+jnNVsel2DHHZCVp1PZgZ0/gqg1Fdbbn/+2nb1FjsQ6gsLPGL0Xiijoskvo7OOyY9phuYgh+AB1Cl/ocksDyFm1/OF+/KmiiI4L+zHoNxgZSHNxpSVb769HNJGPVswULjoMq7yYGUTnYbs3edpBFXXSobnChhCQS8CYf/oYFtPHwi+Pl0osZ6hEK/piiJa1CeB2eYCet/BErJFYh9p95g8noP8wy4lQEGdpDihpbAIl7svJnOJbDSDmhmJsdY3ff1cZ/kOOUOGXJjS6DNpcWfbLzKtawjugTJnTYyjtkFnKgk4JOza5atw+sqtx/nNR8GzeTRLPSq3q3gXB1xzr50d228USc4h0nZtPH+pKScoqO4SOvDN79jTA6+aoHHcdtRiEtD9/Zzzq8FBtvNkF+QvrrTL/X6URw0vu0J8D+dDPDWLprBrlRWChvXP05Oljv4Y0K08RswhwU263RbNE0XnRUPJ80v0OBc/BToU1/iQmO3lwz4CBzGQX0pDo1h20b8eac++cdnAAFJaT6db0Gt/XPdVTYpiAgwLAOsQuG2WOtHtx+LV23ryDeis7rTX/07n9X7KaWjkp1oFtuiTVs38p1Gl07qUOHf6aReayaAdmVjLd61PSI3CpH1j/7q3/is3k45HdiuW5vsaUTx/Yj7TTUUd/GUPU1kKVk8ihlNwQwZQcq3rXEoDdCPNSAEQ+8dsEyJWBb3DcYPJYVMjmalq6Z0qiPNAoFR2UF6ugE5V0nzDY14cv6SuiXiKca1MxENfGCzXRb1w+YzIh9afI7+Hm9YHEmSfSrwnyHjQenFCcsrdkWnqkBmHKFIz0mWb7e2v1oUSS6BJPWgaayGAoJ9LMGL3Wia9gYCUQ1AWXGKk0MVy0VAptiP1B5BZoNxOPsWQ25cqFALAOHscU/tY54SycZiGMtrYPkB3BOYRtP/HLqF0nnJhCBXVbme2zOl/T9JGwEOBWbgfd+XNQHryTIMhjs11K33nvGmLH5YYIDjv6mvcUw2XRcXjnXxqZGqJVkGLLpCFaVc/eG0m4IR0nR3Gjis3sfh9Dl6CuqbZGueIxd68231G/C/l5tGdOOmdP5ysFU09UtG19hSHztMueofy9/+g8a6HGpa0lzBi3UieTSCKfIKjqfTOceb6b7AitUjyqJ+ihb04lBcgzsim7vmM+VDGfPfAN61BtQInLAh0In0W1EKPLLgy9CtsZv75Y/z4Fp/AvM33/gmfrPdfZNm275ajxpcK4uXqjTv+ZfmHKOfNUBKYIIxaEnicq70/V+lHCpeXKyRZt3DlRVfv4dWUXs6nbeE8uMPf3BRw5psP9ORlCuDxsbrgQa797iGo+XzHu+RAvEUzL/FABuS1il/YW56WWMhtVeZkpZLZKvUXu21tmUzgNfeCdCq5a8gXB98hOoqbZtXyhuZ7uKh7DDb5KiuMllHeLOeFkQOoCFdoFV1ymsKFIvjmglDO9qj/KcbdVl4nMnt+ACFutVWXcvSMDfvCeTfBXKQwqhlCVH5/Xx/lC81sqJc+EsPNsar1OSJdgQju1emUqO8XEm+nkHvG4LpbLDd+cPFbFRRcZBk0KNQC9Q+yRtVtjNKakmGDippvthQ6VgQkDFSq7NpZnuvvLZU+pw/LcnAhV1Hox6hKSIRuSR1pTy4EMF2XN4i8fAv/dop23jw+9Bxe0E9bvw/rwsTp3qucHj9AAAzSAFC2nzS4bqK2V+X52YoUyUP8KT0UDNat8ob1yVten0lqKzADssBzjlIr4BXMhbwUf5QbCcSkmrCqWTUKxoYDECJIWUCiU44SeDZ14HZ1HscCgGUYSQRWnZ57n0sq6HP3om+AG61pZHFKOigM0zJuPeEmB+Ny7tCFp1fcI6Ol9YY/zRitkMUh8uIfTjZTT6IfMaC1Yk/EiKGZzqsM4xt5Tg7KveS2zx/fcSKqAT1eNNxP9q7VZiWz5xcLop84PP383gTmCrRej0oakCJI9UONyGNyyMa66bh/Q51lfFW+dA0K8lfGx2CIVQ7bVbHI7GJkLKjKY2piO8LGuAaTVSLBcT34mYElI1OMT9O4W0/m8V8SVy7KkuusNMrdWSZTQJ3rvUye/klnA0CTXGqT7oRk5XcIZV7SdoD46KmAkDO0oq1FC6DKBqXc/b/2AwXvDSEjn4AJWq8osW8xQSqqDshRi8PECN+I7afn12BXQlM8Wv+Q1hR/pRT2CxafbTQQEoNRb5VNs4+zx2S63o0x7knuPgzUz0E2YfPVxlE0tAWbS9UFzV+Mj/99F48SecvgmqrdFm0or4oKQj1U6vScX8CjsT3XWbyP7vsqeJTD1U5kBSte6N8i9aV3suqXbjzk8j4RrZbJ9tviiLDTquaDyC1OJzzy1ijqrGZW2UFCjgsiOSLJgtd0TVhdklgaWzg95tMzs0OC4qpb7Et6KAq8vu6hbQFtv7vu+K5+FMJ+izEegQCluSpoBk0E5ZNWhXmQROrjy8ANtpKeD2CYmay3NhLJaFXKE1JMXMngbMFvoqDceEFo3BRLiDUQGIKAVlX/o2hg9oODkiCVQmRTZ7irz+tJ4K2MKq4Yw2VRrpsO1t4wnbO85xXUWdSHpX1HHaGHNoJkdtVlthFL6yqY5R+VUSXipc4D+O1MoldJHmRlLt/6iN0E31M+juTl9eyRy9VacGL60LW4dBeF6eJmqkOV9Gsj6tF0MD1JpjgIEIyR6GfoVrv1h+U+H73rGh1f87riN8GMxCpXC5MWYqLUYDLYJbiqhV0OMisTFFOJSRk5iiWQBsTJExRK5yaDh7z2fKQtsxyfeCsyDqRF+zXII6WT787owTh4fqclRsz7JrJlCwpovU6oCUh6oCXuWkOo5KH63IQIRq46gvLHOS/2v8OOZP2EBzcBpx27WdQWs6TekuSECWiJqw6sK2RQc0QMvAOWDRvk0o1a5wVNUmu3ciVY1USIwq5sSKo22JAsNrO92azjMXZGuS3Nk7sz/lppW1ZXYY+rrS3MtMwz2fseKv8kqmw3RLjZtXQePWbxCu4pskpV7G4XsJ4vzva6eIn2PboWe0n/YK6deaUDl0sCNvuGfdcs7DQbmtRDTI6iM4lZ84Z0cNdDaD8hGSl85NRGCuFbtb6gzqK6XbSGlvhsb/QTIZPsiBbAfY15vLv7LQknib98/orlv9m+5su18nLOH2FyUbCE9Ki/XDm8x4h7o5f2EwlBK7NGLY2MpaJ8Gl8cpwFltCEFtt6GZCDkCigwxRV/iCrKiQH7gMoKwmRJqMd3kS1lJubP8tN87l5cxNmuFAhBwqrSPmJomU/Sl3pYKEGidQFhx7N82KyZ2mGJyJQS/iiGSTWjZoGAlIFEYUotWjKTOpErqm4kfqU2yGE5DqGstBivNDUjqS5FwlIwhMZpIFMvnn5AEJCHGKFaUx3yjLZ/S+MENml8rSLIKaZAkoIzXOBLUIhHivbOJ5qcuOZm81OIb3MBN84qFbCaI2Ryy/AwEl7Nptd2VLz19fKkzN62vqDnN5H2JZwQDStGGJi7FErsOJWeQJwcha8loUUV6KP1KdO2HAluuhf5jIsLdaBnP2C7rbHTlnSEecx6n0ONlHTO56kKNQgZT/geSquw5Z1fLTNPvadd7NO7rraah0PWKtrNCwmgp09BlHlo/hugan06Qx/goZxmrSqaZO4i6/YBVHKh3hyFdyLymEv3mQXfI9gN5WUNKlS0KX2rAJZ7K7/319NDmUVzVbfNvniLuVpdeCm/w0u6Afl3xGzUPbWKncB98+w2cdefYphjY5c6eo0CslH+ZY7F6h5XEHoUVpt9ubEncjLtC/IXHSB5KZLJEgSxkwuvlIiUKfIsnrUbWJcpPILwoGmnMBTUXAFSCpgkqqCNHO6U2PsGzDnamcryfepJ1DzXapv775We9k9/z9nXpYFcxJ13wkGS6m9Uh3o7j0GT2AxHAh205rya/X3aqxVyVvLTZWBAez8Xkc1m7L1nm5BULFQqS5PMcgsb2fGWVui9wRaFjoJxgp1LxCwyjmc+cQJrUnarR52mYZ9sKGoqrmt19yWQy9CbF5jJoVegT2B2V54Q8XW6TLWk+ZQunhud4kIN4K21zRWqOGt5KP8grVoo6123719p8/+pAe3cKtK19LG/Xt2vgYOJ7vfkn3ae/rAp2vfkz70nDx3CIkr2EWoOCMV+BFvjvsknLrDqcjh7Y9Wa9ZWFblOFckaySluExovHbSknP3UeiSisN5X8h0/6f8wHNfOcYlialcRiTaebzLR2LBJTl4Rr34zISN5GanxJVif+SRx9lKTw8rlb4VaVwg709EEUwAXIS8jhJyBVhzC6HbPDo0Cz63sbaYO3J6412KORjiia7NGX3gTgEfwFjsTluqoxoqOytOUsQTK2tQ+Wyxut3iyxXyUmnfLBertuM+Lgq1eJzUXvawE53K1CeRBLG7dwc/ZImC+Ltb7Xsa22epRJdxUi/epmlIMKAdQO29dYpVdV0r9R0T9rcyv2OV1HrP2/+YD1kgb6Y++9F0kqTlZ5AH25TdhfQb+nog1RcaHdlct9ZI/T/HzusYUBOTDpshOWe+bbQKaQfHvWg4Jv8yWxMrblzgwl+FlfEW0zzneppc3lqbojoi/wTXdV+GxtwkRQb5dQGWtof86MoJikRaBJ/c4M3zTuu+ncB/bo+Y/tvaCvCUCb9OBgO/JVa+kV18pei5TDOTxKbXLIo336oAtufmFUsWo/YIBX/gE/+6Iij/DP5sDQXP0JBdyzeFONJZ/0TebUVzExh6EuViAq7B58mORXHqort/lb9+smlsd1tEN+EKDcKxjYKClXLkX4Qf+SxTOmlJcpBx+pcJzSW0D0ltUt8oauqRcAphz+/us9SxNYbGtMAwtNkwP6mFVCp0xXJLe2p/XVMau3ksXek5uZov6uFow5/ja/GaS2yZ2A6lLchcUFLY45JbXS3IDHLfYbmm3yQyAYXh7XZ4uUj9uZW53MK47ZYPrA3MRDpcO8k66LJDqOV1EyCeY6oLV6gkpV9K3vKyWTJPQ73pbWJnfnIsVuD3p1YU59OAj2gF7WtdCx/CqxCRcuBlclkrDpOxzMP4Gwxweey1ygNrOSy35ciTMcVsumg0kNQGmrx/baEmRgmG32BYAKVdBhhjhf7I2kOW21Uu8RLavVlV3EWkO2bDOBZf5begnlT0Z/3cnIamoDQhRfQgqn4Tsy7aInkjcEi5t43/RfFej9pPFYEOKBxqWDEXUUjmmM8FuvIBv5+/g5JlgXDPX+iCeqU0Cp314nv0LTi3KNi8dbJT30o4wwHPTKanehsdS5vKSNfuGsfrqi9rgkr2BQN4Rlsz9CLvcru8kaW/8GPFV6Pyo1fsnsETaQdffsM/cg+B4K2chq7rV6zDNIT13bs2f6J4SIoIpAuSxDioSpGpTCZ7mFnwRYerGJfO/RYN9rMlBZp2HxeYIEC3zBq98gq+Clh/SxgNwhxibg9ChQ6VaHVepvQdpGe/CLbQXuhW5KuNiIJBXm2rMq20Wx58Xp8+tXLRaioxIFc6ER/lQOy9RTIcE5+DXK1UPfb5XNiaOT0TjrBV0gL9mISWPA3YbdFWjglacGhNP108T5ndyA85Tu7KtQVYqYQxYwWBceiszLKflPMaZJQD5ItIrMLjQFRNxBAiNooRBjm15rzxpJORrqZ+ttN1PRQlmUg7xS5WC93rxZxVcw+Y3rAdcd4TG3eop4hW/bdRPPoRyYN7jIsR2L+mzq+G/ilhfqddmUchicQ2KLmWBGKS7bkzRiRQadBCGXYKCblVEiWir8mdcrIA646hwymIpXTYMdR5rbUpNI3qvddVMMxQgJBJ4X/0eSYrMA6dSlOR0iYYBJNYRAsEQvPlMpt41IuCIKIY0omLF9mVHgwd1C7MkKsEcg1VpBmZMj+ks6eqlHvMW2oT7Wfc4nAu9hzMAEmoLilwHDFXCOxs5Mxyrc9Orbhwhlv0Ci7gxtadBDlutwD7VGhta3u6fheojSSj7w3tKJXDCdh2SVvNrn45tfz/xil/fvv52FymbPEQau7iKoZFRtmLr7pHrv2zk73UZmwbcj0ypCD7JLBI3KGZKsUyGxhXnW4IdUPJLZYXTpKzhmBHbR21QWdmGEBKYvdPvY5N3HZrCFv1e2QIXDt+xRn/owBz1DpVVPaLcF0vA8u72ITcqpnHls5vEmM6suhagpZwv7BnTBtLvIoFORqtYBBwaepUMV7/RhFYeyocIkPU/hDQezgUtpf5FqOshXCA213GoCLC0X1/6lHbYGMWasWCnrKSl9O6BJiGUWmDTD32GRmvCjelwFcTdgVuKuLlF5qq0hKSDJHV+vHEUEG4XaaY8nnQiBKItZ42LXMPDQLEODtNoP/7t4z6hyHolGGu0Q1mOUKkS3C0NRag0rIJXi2SjdiMtfsm6j8aMoDvDGSovLI6B2skEORRlJ4XqAdGjocXoAAmbqj2zKnKoJttaGdlo6eqVxAoDJT44oSMhHnOB5GZnv2fuJnK4zSRqxZ6mgsTU5yiZaZKvypJocEkc6kFIe6B3DhA4knx0wcJ2ddCI4DPmLtVRCZ94T2XaJs14EneJKRMfFE2IlgijkP62miXUmbklRuPZakLe640dYDgWjsLyH6erpNCaROnBbtLF0Ovk78MZC/SITS+kuOJtDuLArnOSHOkG4LBzfa8LidDAeFVg6mueRUCOk8oYmaMTF6cwcwiZtIUMsafbn1ZX5RCOhjhPFCFOayF1u/pOff46oeznLA49sDoh/6OwgxOl0bo2AlzihhODgGLq9Kqcop3rNplCbZT7Fnd9ADOkPZUfuge9dBovleLcqBJ9abnG/C6j1ACYqRbu4egKY93gEr0WBG5FjkZwaEObVYliSDQsyiEC5RXHp/2AZeFJuREigIedpTBP8z93ONvmgC7j1OtEPabCNRyIpC1oqwwQWv0oNFJG9/14wzd1TKxYxFBZch6Fl52tBiB2M8HFdct4faBHgPlZMIsOmoUrGLcdUD8tj9t3n5KQ2dhSXue3oBeCN18IENU8QWq85xqo1kG73JMYE+zKcAsJ/rJx7bpZA7gwPdy+CxhXtFFP4LkH+kmp/XOVAkrY7v5+K81N9njXjKdXS1ewvnWC5zxKkrqanFtjfSqmTchNJroRXHEOc3SduUtd6X1RkwUozpOJPztqWPOihljW3b4xKKXEQzqH2c1cGOmFjkzVj9uPlDeMeZ/fj84yERTK22KmBerxHgdoQF9LbXlzn5ES/phmxn7gKRXXNSYm9N164ME16yHOx6UxS0fdobHXFTFlhO0E5Q0YPLKB0mFL8cS0Q+x7WGdadruMd0pO4UuAmLJJmIsM+YDIuzDHlwUITctR4CbDpRWq/YkSfC7G6Ve3IwMZAqp2t48mL//HEMMP3Ne0v1dYSxUWhRLPmm2xeCZ+3MnX9FXIl3bWN6yyi/HWHB8SQOOIB0pUPTlSIB7oex2LYUD4t8X0t528aQT5nZwOjTOLwNeHa+NXciPncsdvQ7a3s1v/LEVU67745O/6KNs/qJeX/9+wjvDQEzsHqGaPYDmOjqUA3cHN4rzARIGijh+xJCgTMj5B1NgAsZRAc7w2ieH+zR917EfsPT7S1A6Gst3k5eO10ebatOxTvqaEbpvC82dorJ3sVOZpvML4nWa3xIjYEjmRWeiYiQ4NbyZHqeEgSA3kZDmu3i1ZVMQZQbpeUd71x5Cwsllju+x4DOEc3CHfWR8ySavjSasbvsypoB2UVqKDVty/jdfFjdhNDXYRDxBGa1fh2/2CJs2MXthrkPUOgW1/tyAZUP5erkFmApF0f24Yos3SHKHT1kydbmdyToC+2gQ6UFt310qVrVBbVGGHNfNAlZ337y8sXSqYjArSm4N4zbAu+rs+Y63TeC5yC2SsaGCwRODd4e6SnRsjnUoRcMpidsNs5nv4E+cEKMSpw2owfzvJIQAi4AMBDSrYdSNWDIqjNn8jr7B4MoORnfPZ7JOZj+mbw2/kE7uFB0Yk7lcu0fj/5ipzfFALZP3hCf4ZtMyyofe3BXHwxyqGddOg1nYgYkXHITeHc9gqOk7R3e40zmmRudTYKSsI7/tns7kAjzITWTMCx2PhF5rbLvgd5lrpV9QfKFiUA2Q9AK2/e3nh1aNPm2TaweRGByOilyt/Otn57+w98oMlxvs63D2+Pkl7pVMkWm4Syq+RZZnYpmIjRU5JvMwJ+e4l1lcCabRRvfHSNhjbuhkWbQboptviqp7ymDoeowyoce1Xmt097KFJwiSsXdJkU8PUD+z7MlMn5EeCY0IQjs3nvtauJg3imri2T92s2Su3qpw2mEhiG9jJbSRz8j10ANqPsPj0FsUqaUMtmhyOd4ZYsJGacoHpzPa3TfM/rezGKhUndEJMau6Hz5IThhaqTm1jX2PM4HtJcA5Zc87WJbsckAGrAEcCIBhxPIChPLSnZoMT5sFYfKC3V1qoBvMcJI781PT5x0hjZYmQMwDpItB4EOHEDmn0m75OzdBhNrp7WuOAQSSqGbFLKZhxY2cW9birzyRMOqUL738d4K5EFty75UL2sX+ZFtAN48zF4BktRmhIjTbD7OX+VqJhDFKUEbtbUzmnQ1PV+bDUPDQOtreC7KbqIEJlT8jZoeLkTB9pZ6bujxzqL+KIpKMKPe0kxqqf+twexA8rOsLmWO4qziEo3YSwnYvmwr4mZ3RO4nIcopEdDOtKSVNtwtlkuvcDxFQ05zSkU/NhTVkUR9JtsfG+OvKtmmpaYsU8Z1gcTOWfHqBC4qe/uHf8o2J8UdNdT3KIxB8OCfnmKpslCDY+u5lGIR9iK/jt2ak8Oflt6vMz40LdFMSK5kn7mDke7Z36y/JC9Va5Y8gsvSF55k3lvmTth4xO6TppRTrG4x3Cnh+POmLSjzNe1zmQF0N1+ZiOYKfdEmKxWSC3E3aMt7AUX3SMXNmF0/SBWvwbapStJIGIJMjQeSpRtoNJkZocWdwpsje1qGE1jkuljLHXd9CW7AIk6ZoiZ/PyeDwLqw9OjeGNZTqHsYynauqJekM8njWKPHskYiTfZw7EhVNl3agJDwp0ViubpbA9m8Y6V9XlqnTWLvcqpB6R4AiEMySjWmYE/UYmfWouWttF59/BoiOcxxgbWTU4mR1CohX54TbWnLXoDGdPNtiAvTNkK1WPJoHb12tSRP91kIbFAd0l/yemKfGwTJHSJTiroKw7jMq4nnggOmxskDKIzvH9vcdHlEKhrzKhQUVvecVsscqrfCp4zyytVUmE6nsTH/sYf0ZSZmLjCpGuBu7wB438OGMq5d/aT4syPdqdgbPaYbt7NyWymaOsVEW6xSH3UUaRo/VqTq4IZ0htmfeatxMr2JSRhbTFNEUZ17DEQS0UL+jWvrii1f67NdpzFHUMUBdU4v+dWIC6Xh1BXXMtwqkVwMKchBGNDKJZWhQB+CSVqkiMa3nxZ3WNKFddMiSk5enWGBI3hnEfxIYU7j3JkGCEaemaFPh9rV895sJhdxfTL1XHrWSkCMVRkd6VU5A/dCo2qSIm6g/UeH1aB2OyCjMNmPlMPn0vZuxI+pWJ/bJMKYZadfX8QdEuD8SSj0g4AnW1QYjPHfWaiAKxkbhJVmdxMiG0QU1DWz4DeZptmkHDQhRW5idMqzifRbpkqGdOvx08wiE6Kz6dSvkK7x5p4zrH9GeE8eLFuRbwq28v2kiT6jskPS7okJj4VkziTQSommnr0xScVQ5bN2R3/ZxU99rnZOfPbZByxGCTfmSXk7aRQ5I8d0ncXbzEsW0Lkk4tLVWuUNsgatviVbiTbhOL6LKq0RN7no5kNlUjbLwubt584R4nF/ZRFC+DnORNTSD75rukAMq35s3RPSlvVeK703y3z5d8erCRa6543oNbPg6qtyN8O7LOPvFAUR0AhCyTP4q+A64qO4R1ludQXWImoSjdNMwgsiZHgHQG"
}
However, when decoding the text of the content using
base64.b64decode(my_coded_text).decode("UTF-8")
I invariably receive the following error:
UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 0-1: invalid continuation byte
What's the proper way to decode such strings?
EDITED TO ADD
I'm assuming the issue comes from this line from rfc1341: "A CRLF sequence in base64 data should be converted to a quoted-printable line break, but ONLY when converting text data" because decoding works fine with images.
However, I've yet to understand what's a CRLF sequence in base64 data and how it can be converted.
I'm not sure what kind of data is supposed to be in the "text" property, but it sure isn't text. You'll need to analyze the application that made this request to figure this out.
The decoded text starts out with the following data, which doesn't contain any readable text, and doesn't match with any known file format:
00000000: e2a1 7614 9214 be79 5194 3352 a680 7a29 ..v....yQ.3R..z)
00000010: e064 8850 0375 fe03 f106 0ab6 1351 55cf .d.P.u.......QU.
00000020: ad32 a1ef d6fb a2a4 96e4 30ab 203e eb8c .2........0. >..
00000030: e3f1 46dd 155c 2e56 21c4 7987 df94 833f ..F..\.V!.y....?
An analysis of the data shows that it's virtually incompressible. This means that it may be encrypted, or compressed with an unknown algorithm.
I couldn't find a way to decode the base64 but I could find a workaround: sending the same request using selenium-requests and accessing the response which is, this time, not base64-encoded.
Code now looks like this:
for http_req in proxy.har["log"]["entries"]:
req_url = http_req["request"]["url"]
if url in req_url:
# Captures the POST data
payload = {}
for post_param in http_req["request"]["postData"]["params"]:
param_name = post_param["name"]
param_value = post_param["value"]
payload[param_name] = param_value
# Resends the post request
r = driver.request('POST', url, data=payload)
# -*- coding: UTF-8 -*-
import urllib.request
import re
import os
os.system("cls")
url=input("Url Link : ")
if(url[0:8]=="https://"):
url=url[:4]+url[5:]
if(url[0:7]!="http://"):
url="http://"+url
try :
try :
value=urllib.request.urlopen(url,timeout=60).read().decode('cp949')
except UnicodeDecodeError :
value=urllib.request.urlopen(url,timeout=60).read().decode('UTF8')
par='<title>(.+?)</title>'
result=re.findall(par,value)
print(result)
except ConnectionResetError as e:
print(e)
TimeoutError is disappeared. But ConnectionResetError appear. What is this Error? Is it server problem? So it can't solve with me?
포기하지 마세요! Don't give up!
Some website require specific HTTP Header, in this case, User-agent. So you need to set this header in your request.
Change your request like this (17 - 20 line of your code)
# Make request object
request = urllib.request.Request(url, headers={"User-agent": "Python urllib test"})
# Open url using request object
response = urllib.request.urlopen(request, timeout=60)
# read response
data = response.read()
# decode your value
try:
value = data.decode('CP949')
except UnicodeDecodeError:
value = data.decode('UTF-8')
You can change "Python urllib test" to anything you want. Almost every servers use User-agent for statistical purposes.
Last, consider using appropritate whitespaces, blank lines, comments to make your code more readable. It will be good for you.
More reading:
HTTP/1.1: Header Field Definitions - to understand what is User-agent header.
21.6. urllib.request — Extensible library for opening URLs — Python 3.4.3 documentation - Always read documentation. Link to urllib.request.Request section.
i have a custom url of the form
http://somekey:somemorekey#host.com/getthisfile.json
i tried all the way but getting errors :
method 1 :
from httplib2 import Http
ipdb> from urllib import urlencode
h=Http()
ipdb> resp, content = h.request("3b8138fedf8:1d697a75c7e50#abc.myshopify.com/admin/shop.json")
error :
No help on =Http()
Got this method from here
method 2 :
import urllib
urllib.urlopen(url).read()
Error :
*** IOError: [Errno url error] unknown url type: '3b8108519e5378'
I guess something wrong with the encoding ..
i tried ...
ipdb> url.encode('idna')
*** UnicodeError: label empty or too long
Is there any way to make this Complex url get call easy .
You are using a PDB-based debugger instead of a interactive Python prompt. h is a command in PDB. Use ! to prevent PDB from trying to interpret the line as a command:
!h = Http()
urllib requires that you pass it a fully qualified URL; your URL is lacking a scheme:
urllib.urlopen('http://' + url).read()
Your URL does not appear to use any international characters in the domain name, so you do not need to use IDNA encoding.
You may want to look into the 3rd-party requests library; it makes interacting with HTTP servers that much easier and straightforward:
import requests
r = requests.get('http://abc.myshopify.com/admin/shop.json', auth=("3b8138fedf8", "1d697a75c7e50"))
data = r.json() # interpret the response as JSON data.
The current de facto HTTP library for Python is Requests.
import requests
response = requests.get(
"http://abc.myshopify.com/admin/shop.json",
auth=("3b8138fedf8", "1d697a75c7e50")
)
response.raise_for_status() # Raise an exception if HTTP error occurs
print response.content # Do something with the content.
I am using python to send a request to a server. I get a cookie from the server. I am trying to decode the encoding scheme used by the server - I suspect it's either utf-8 or base64.
So I create my header and connection objects.
resp, content = httpobj.request(server, 'POST', headers=HTTPheader, body=HTTPbody)
And then i extract the cookie from the HTTP Stream
cookie= resp['set-cookie']
I have tried str.decode() and unicode() but I am unable to get the unpacked content of the cookie.
Assume the cookie is
MjAyMTNiZWE4ZmYxYTMwOVPJ7Jh0B%2BMUcE4si5oDcH7nKo4kAI8CMYgKqn6yXpgtXOSGs8J9gm20bgSlYMUJC5rmiQ1Ch5nUUlQEQNmrsy5LDgAuuidQaZJE5z%2BFqAJPnlJaAqG2Fvvk5ishG%2FsH%2FA%3D%3D
The output I am expecting is
20213bea8ff1a309SÉì˜tLQÁ8².hÁûœª8<Æ
*©úÉzµs’Ïö¶Ñ¸•ƒ$.kš$5gQIPf®Ì¹,8�ºèA¦IœöZ€$ùå% *ao¾Nb²¶ÁöÃ
Try like this:
import urllib
import base64
cookie_val = """MjAyMTNiZWE4ZmYxYTMwOVPJ7Jh0B%2BMUcE4si5oDcH7nKo4kAI8CMYgKqn6yXpgtXOSGs8J9gm20bgSlYMUJC5rmiQ1Ch5nUUlQEQNmrsy5LDgAuuidQaZJE5z%2BFqAJPnlJaAqG2Fvvk5ishG%2FsH%2FA%3D%3D"""
res = base64.b64decode(urllib.unquote(cookie_val))
print repr(res)
Output:
"20213bea8ff1a309S\xc9\xec\x98t\x07\xe3\x14pN,\x8b\x9a\x03p~\xe7*\x8e$\x00\x8f\x021\x88\n\xaa~\xb2^\x98-\\\xe4\x86\xb3\xc2}\x82m\xb4n\x04\xa5`\xc5\t\x0b\x9a\xe6\x89\rB\x87\x99\xd4RT\x04#\xd9\xab\xb3.K\x0e\x00.\xba'Pi\x92D\xe7?\x85\xa8\x02O\x9eRZ\x02\xa1\xb6\x16\xfb\xe4\xe6+!\x1b\xfb\x07\xfc"
Of course the result here is a 8-bit string, so you have to decode it to get the the string that you want, i'm not sure which encoding to use, but there is the decoding result using the unicode-escape (unicode literal) :
>>> print unicode(res, 'unicode-escape')
20213bea8ff1a309SÉìtãpN,p~ç*$1ª~²^-\ä³Â}m´n¥`ÅBÔRT#Ù«³.K.º'PiDç?¨ORZ¡¶ûäæ+!ûü
Well Hope this can help .