Storing a random byte string in Python - python

For my project, I need to be able to store random byte strings in a file and read the byte string again later. For example, I want to store randomByteString from the following code:
>>> from os import urandom
>>> randomByteString=urandom(8)
>>> randomByteString
b'zOZ\x84\xfb\xceM~'
What would be the proper way to do this?
Edit: Forgot to mention that I also want to store 'normal' string alongside the byte strings.

Code like:
>>> fh = open("e:\\test","wb")
>>> fh.write(randomByteString)
8
>>> fh.close()
Operate the file as binary mode. Also, you could do it in a better manner if the file operations are near one place (Thanks to #Blender):
>>> with open("e:\\test","wb") as fh:
fh.write(randomByteString)
Update: if you want to strong normal strings, you could encode it and then write it like:
>>> "test".encode()
b'test'
>>> fh.write("test".encode())
Here the fh means the same file handle opened previously.

Works just fine. You can't expect the output to make much sense though.
>>> import os
>>> with open("foo.txt", "wb") as fh:
... fh.write(os.urandom(8))
...
>>> fh.close()
>>> with open("foo.txt", "r") as fh:
... for line in fh.read():
... print line
...
^J^JM-/
^O
R
M-9
J
~G

Related

Python Add a New Line without \n

I have this
f = open(os.path.join(r"/path/to/file/{}.txt").format(userid), "w")
f.write(str(points))
f.write(str(level))
f.write(str(prevtime))
f.close()
I know about using with open(blah) as f: and prefer this but when I have this code, even if I write the file first and then change to append mode, without adding a +"\n" it doesn't add to a new line. The reason \n is a problem is that when I go to get the data using
f = open(os.path.join(r"blah\{}.txt").format(userid), "r")
lines = f.readlines()
points = float(lines[0])
I'll get an error telling me it can't interpret (for example: 500\n) as a float because it reads the \n. Any idea what I can do?
EDIT
I ended up fixing it by just not making it a float, but now that is giving me a ValueError Unconverted Data Remains. These issues are only happening due to the line in the txt file that should contain a date in the format of %H:%M
EDIT 2019
I see lots of people trying to search the same question. My problem actually ended up with me ignoring and having a very clear lack of understanding of types in Python.
To answer the question that I imagine many people are searching for when they view this, \n is Python's newline character. An alternative (if using print()) would to be to call print() exactly as shown with no data, resulting in a blank line.
So, you have something like this
>>> f = open("test.txt", "w")
>>> f.write(str(4))
>>> f.write(str(20))
>>> f.close()
>>> f = open("test.txt")
>>> f.readlines()
['420']
But, you need to write newlines, so just do so
>>> f = open("test.txt", "w")
>>> f.write("{}\n".format(4))
>>> f.write("{}\n".format(20))
>>> f.close()
>>> f = open("test.txt")
>>> f.readlines()
['4\n', '20\n']
>>> f.close()
If you need no newline characters, try read().splitlines()
>>> f = open("test.txt")
>>> f.read().splitlines()
['4', '20']
EDIT
As far as the time value is concerned, here's an example.
>>> from datetime import datetime
>>> time_str = datetime.now().strftime("%H:%M")
>>> time_str
'18:26'
>>> datetime.strptime(time_str, "%H:%M")
datetime.datetime(1900, 1, 1, 18, 26)
To print without newlines, use below.
But without newlines, you might need to add some separator like space to separate your data
>>> sys.stdout.write('hello world')
hello world>>>
With the newlines remain, you could use rstrip to strip off the newlines when reading out
lines = f.readlines()
points = float(lines[0].rstrip)
Alternatively, I prefer more pythonic way below
lines = f.read().splitlines()
points = float(lines[0])

readlines() cannot read lines after using readline()

The following simple code reads a CSV file and returns the number of lines of the file. As you can see in the output, the file has 501 lines.
>>> import codecs
>>> f = codecs.open("tmp.csv", "r", "utf_8")
>>> print len(f.readlines())
501
But if I insert a readline() before using readlines(), the latter does not reach at the end of the file.
>>> import codecs
>>> f = codecs.open("tmp.csv", "r", "utf_8")
>>> f.readline()
>>> print len(f.readlines())
1
Is there any basic mistake in my code? How can I mix readline() and readlines()? (actually I don't need to mix these two functions in my real program, but I am just curious...)
You can download the file at
https://dl.dropboxusercontent.com/u/16653989/tmp/tmp.csv
This has something to do with the codecs module. Because when you do the same thing with the regular python open statement, it works as expected:
f = open('tmp.csv')
f.readline()
>>> print len(f.readlines())
500

Python JSON preserve encoding

I have a file like this:
aarónico
aaronita
ababol
abacá
abacería
abacero
ábaco
#more words, with no ascii chars
When i read and print that file to the console, it prints exactly the same, as expected, but when i do:
f.write(json.dumps({word: Lookup(line)}))
This is saved instead:
{"aar\u00f3nico": ["Stuff"]}
When i expected:
{"aarónico": ["Stuff"]}
I need to get the same when i jason.loads() it, but i don't know where or how to do the encoding or if it's needed to get it to work.
EDIT
This is the code that saves the data to a file:
with open(LEMARIO_FILE, "r") as flemario:
with open(DATA_FILE, "w") as f:
while True:
word = flemario.readline().strip()
if word == "":
break
print word #this is correct
f.write(json.dumps({word: RAELookup(word)}))
f.write("\n")
And this one loads the data and returns the dictionary object:
with open(DATA_FILE, "r") as f:
while True:
new = f.readline().strip()
if new == "":
break
print json.loads(new) #this is not
I cannot lookup the dictionaries if the keys are not the same as the saved ones.
EDIT 2
>>> import json
>>> f = open("test", "w")
>>> f.write(json.dumps({"héllö": ["stuff"]}))
>>> f.close()
>>> f = open("test", "r")
>>> print json.loads(f.read())
{u'h\xe9ll\xf6': [u'stuff']}
>>> "héllö" in {u'h\xe9ll\xf6': [u'stuff']}
False
This is normal and valid JSON behaviour. The \uxxxx escape is also used by Python, so make sure you don't confuse python literal representations with the contents of the string.
Demo in Python 3.3:
>>> import json
>>> print('aar\u00f3nico')
aarónico
>>> print(json.dumps('aar\u00f3nico'))
"aar\u00f3nico"
>>> print(json.loads(json.dumps('aar\u00f3nico')))
aarónico
In python 2.7:
>>> import json
>>> print u'aar\u00f3nico'
aarónico
>>> print(json.dumps(u'aar\u00f3nico'))
"aar\u00f3nico"
>>> print(json.loads(json.dumps(u'aar\u00f3nico')))
aarónico
When reading and writing from and to files, and when specifying just raw byte strings (and "héllö" is a raw byte string) then you are not dealing with Unicode data. You need to learn about the differences between encoded and Unicode data first. I strongly recommend you read at least 2 of the following 3 articles:
The Python Unicode HOWTO
Pragmatic Unicode by Ned Batchelder
The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) by Joel Spolsky
You were lucky with your "héllö" python raw byte string representation, Python managed to decode it automatically for you. The value read back from the file is perfectly normal and correct:
>>> print u'h\xe9ll\xf6'
héllö

Map over csv in python

I'm trying to use "map" on a csv file in python.
However, the line map(lambda x: x, reseller_csv) gives nothing.
I've tried iterating over the csv object, and it works fine and can print the rows.
Here's the code.
# imports
import csv
# Opens files
ifile = open('C:\Users\josh.SCL\Desktop\Records.csv', 'r')
ofile = open('C:\Users\josh.SCL\Desktop\RecordsNew.csv', 'w')
resellers_file = open('C:\Users\josh.SCL\Desktop\Reseller.csv', 'r')
# Setup CSV objects
csvfile = csv.DictReader(ifile, delimiter=',')
reseller_csv = csv.DictReader(resellers_file, delimiter=',')
# Get names only in resellers
resellers = map(lambda x: x.get('Reseller'), reseller_csv)
A csv.DictReader is a use-once gadget. You probably ran it a second time.
>>> import csv
>>> iterable = ['Reseller,cost', 'fred,100', 'joe,99']
>>> reseller_csv = csv.DictReader(iterable)
>>> map(lambda x: x.get('Reseller'), reseller_csv)
['fred', 'joe']
>>> map(lambda x: x.get('Reseller'), reseller_csv)
[]
>>>
While we're here:
(1) [Python 2.x] Always open csv files in BINARY mode.
[Python 3.x] Always open csv files in text mode (the default), and use newline=''
(2) If you insist on hardcoding file paths in Windows, use r"...." instead of "...", or use forward slashes -- otherwise \n and \t will be interpreted as control characters.
The following works for me:
>>> data = ["name,age", "john,32", "bob,45"]
>>> list(map(lambda x: x.get("name"), csv.DictReader(data))) # Python 3 so using list to see values.
['john', 'bob']
Are you sure you get any data at all from your DictReader? Do you read any data from it prior to that, exhausting the reader perhaps?
First on your specific problem: try checking if there is actually a key named 'Reseller', chances are its there with different capitalization or extra space. See list of all the keys (assuming non-exhausted DictReader):
>>> csvfile.next().keys()
Otherwise the map() should work fine. But i'd argue it's more readable (and faster!) done like this:
resellers = [x['Reseller'] for x in reseller_csv]

Python file input string: how to handle escaped unicode characters?

In a text file (test.txt), my string looks like this:
Gro\u00DFbritannien
Reading it, python escapes the backslash:
>>> file = open('test.txt', 'r')
>>> input = file.readline()
>>> input
'Gro\\u00DFbritannien'
How can I have this interpreted as unicode? decode() and unicode() won't do the job.
The following code writes Gro\u00DFbritannien back to the file, but I want it to be Großbritannien
>>> input.decode('latin-1')
u'Gro\\u00DFbritannien'
>>> out = codecs.open('out.txt', 'w', 'utf-8')
>>> out.write(input)
You want to use the unicode_escape codec:
>>> x = 'Gro\\u00DFbritannien'
>>> y = unicode(x, 'unicode_escape')
>>> print y
Großbritannien
See the docs for the vast number of standard encodings that come as part of the Python standard library.
Use the built-in 'unicode_escape' codec:
>>> file = open('test.txt', 'r')
>>> input = file.readline()
>>> input
'Gro\\u00DFbritannien\n'
>>> input.decode('unicode_escape')
u'Gro\xdfbritannien\n'
You may also use codecs.open():
>>> import codecs
>>> file = codecs.open('test.txt', 'r', 'unicode_escape')
>>> input = file.readline()
>>> input
u'Gro\xdfbritannien\n'
The list of standard encodings is available in the Python documentation: http://docs.python.org/library/codecs.html#standard-encodings

Categories