I am trying to write the contents of a re.findall to a file. I tried
output_file.write (findall(tags_pattern, searchtext))
but I got a type error. How do I convert this to a type that can be written to a file?
Thanks
The easiest way is to JSON-encode it. See the json module.
you have the str(res) and repr(res) function but you could also do ','.join(res)
re.findall returns a list of matches found in searchtext for tags_pattern. Those matches are just strings. The easiest way to convert a list of strings into a single string which can be written to a file is to call str.join on a string representing the separator you want to insert between the strings in the list. For example, you may call '\n'.join(findall(tags_pattern, searchtext)) if you want to store each match on its own line.
The pickle module is built to quickly store Python structures in a file. It is not nearly as portable as JSON or some other serialization format, but depending on your purposes, it may be just enough.
To use pickle:
import re, pickle
r = re.findall(pattern, text)
with open('results.pkl', 'wb') as resultsfile:
pickle.dump(r, resultsfile)
To recover the list, use pickle.load:
with open('results.pkl', 'rb') as resultsfile:
r2 = pickle.load(resultsfile)
I'd be wary of using this in production code, or where you need to transmit the re.findall results to a web client, but for quick testing and local storage, this is probably the easiest.
Related
I need to reach a specific byte count and want to fill the rest of document with null bytes such as lower-level ascii characters. I've heard there's a way using python but I'm unsure how to. Any suggestions?
You can use chr() for this, for example:
t = "BLA"
t += chr(33)
print(t)
will print BLA!. 33 is the decimal value for "!". Use any other value you like, loop over your string and write it to a new file (use open, read, and write, of course to accomplish this on a file base).
Edit: Use
file_ptr = open("filename.txt", "r")
t = file_ptr.read()
for example to grab your file's contents.
I'm trying to write a csv file from json data. During that, i want to write '001023472' but its writing as '1023472'. I have searched a lot. But dint find an answer.
The value is of type string before writing. The problem is during writing it into the file.
Thanks in advance.
Convert the number to string with formatting operator; in your case: "%09d" % number.
Use the format builtin or format string method.
>>> format(1023472, '09')
'001023472'
>>> '{:09}'.format(1023472)
'001023472'
If your "number" is actually a string, you can also just left-pad it with '0''s:
>>> format('1023472', '>09')
'001023472'
The Python docs generally eschew % formatting, saying it may go away in the future and is also more finnicky; for new code there is no real reason to use it, especially in 2.7+.
Ok, so I'm using an API. I'm trying to display a list that is returned by the api. The challenge is that I need to use .json to go through the response, but then it makes the list a json list and looks wrong.
checkList #is the return value
>>> checkList
u'{"list":["ad","ae"]}'
>>> str(checkList.json()['list'])
"[u'ad', u'ae']"
I'm using a python shell. How would I remove the " u' " from each element in the list? Thanks
The issue is not really in removing the u from the start of those strings. The easiest way to do this is to import the json module and call json.dumps(checklist.json()['list']). It will do the right thing for you. The strings the json module returns are unicode objects (and are represented in the repr) as unicode literals. To "remove" them you need to handle the unicode strings better and this is the easiest way that will result in the least hair pulling and most forward compatibility with python 3.
I want to generate bytes sequence containing string length and string content.
For example, for string 'hello' I want to get b'\x05hello'
After some docs reading I've wrote a function:
def LenAndStrBytes(strdata):
return bytearray([len(strdata)&0xFF])+strdata if strdata!=[] else 0
question:
I'm new in python programming and I wonder, which are best python practices to concatenate different types of data like int and something iterable like bytearray
Did I write my function optimally?
Well, just larsmans points out, it depends on your usage. If you can get the result w/ clear code and fulfilling context limitation, it is nice practice which is suitable.
No need for &0xFF, bytearray checks to ensure values between 0 and 255.
>>> strdata = 'hello'
>>> bytearray([len(strdata)]) + strdata if strdata else bytearray()
bytearray(b'\x05hello')
And you could also
import struct
bytearray(struct.pack('B%ds' % len(strdata), len(strdata), strdata))
Are you trying to serialise binary data before you write it to a file (or send it over a network?)
Did you perhaps mean to use the pickle module for data serialisation instead?
I have a dictionary of dictionaries in Python:
d = {"a11y_firesafety.html":{"lang:hi": {"div1": "http://a11y.in/a11y/idea/a11y_firesafety.html:hi"}, "lang:kn": {"div1": "http://a11y.in/a11ypi/idea/a11y_firesafety.html:kn}}}
I have this in a JSON file and I encoded it using json.dumps(). Now when I decode it using json.loads() in Python I get a result like this:
temp = {u'a11y_firesafety.html': {u'lang:hi': {u'div1': u'http://a11y.in/a11ypi/idea/a11y_firesafety.html:hi'}, u'lang:kn': {u'div1': u'http://a11y.in/a11ypi/idea/a11y_firesafety.html:kn'}}}
My problem is with the "u" which signifies the Unicode encoding in front of every item in my temp (dictionary of dictionaries). How to get rid of that "u"?
Why do you care about the 'u' characters? They're just a visual indicator; unless you're actually using the result of str(temp) in your code, they have no effect on your code. For example:
>>> test = u"abcd"
>>> test == "abcd"
True
If they do matter for some reason, and you don't care about consequences like not being able to use this code in an international setting, then you could pass in a custom object_hook (see the json docs here) to produce dictionaries with string contents rather than unicode.
You could also use this:
import fileinput
fout = open("out.txt", 'a')
for i in fileinput.input("in.txt"):
str = i.replace("u\"","\"").replace("u\'","\'")
print >> fout,str
The typical json responses from standard websites have these two encoding representations - u' and u"
This snippet gets rid of both of them. It may not be required as this encoding doesn't hinder any logical processing, as mentioned by previous commenter
There is no "unicode" encoding, since unicode is a different data type and I don't really see any reason unicode would be a problem, since you may always convert it to string doing e.g. foo.encode('utf-8').
However, if you really want to have string objects upfront you should probably create your own decoder class and use it while decoding JSON.