I simply want to replace the last character in my file. The reason is because when I write to my file, at the last point in which I write to the file, there is an extra , that is included at the end. I simply don't want to write that , at the end, but rather would want to replace it with a ] if possible. Here is my attempt:
reader = csv.DictReader(open(restaurantsCsv), delimiter=';')
with open(fileName, 'w+') as textFile:
textFile.write('[')
for row in reader:
newRow = {}
for key, value in row.items():
if key == 'stars_count' or key == 'reviews_count':
newRow[key] = float(value)
else:
newRow[key] = value
textFile.write(json.dumps(newRow) + ',')
textFile.seek(-1, os.SEEK_END)
textFile.truncate()
textFile.write(']')
It all works properly until I get to textFile.seek(-1, os.SEEK_END) where I want to seek the end of the file and I want to remove that last , in the file, but I get an error saying io.UnsupportedOperation: can't do nonzero end-relative seeks. Therefore, I made it so that my file opens with wb+ parameters, but if I do that, then I can only write bytes to my file, and not strings. Is there any way I can simply replace the last character in my file with a ] instead of a ,? I know I can simply open the file to read, truncate the file, then open the file again to write the last ] but that seems inefficient (as shown here):
with open(filename, 'rb+') as filehandle:
filehandle.seek(-1, os.SEEK_END)
filehandle.truncate()
with open(filename, 'a') as filehandle:
filehandle.write(']')
Any help would be appreciated. Thanks!
You can slightly modify your approach and instead of appending a comma at the end of each line, you just prepend a comma to every line but the first:
reader = csv.DictReader(open(restaurantsCsv), delimiter=';')
with open(fileName, 'w+') as text_file:
text_file.write('[')
for index, row in enumerate(reader):
new_row = {}
for key, value in row.items():
if key in ('stars_count', 'reviews_count'):
new_row[key] = float(value)
else:
new_row[key] = value
if index != 0:
text_file.write(',')
text_file.write(json.dumps(new_row))
text_file.write(']')
To replace last character of the file, i.e. last character of last line.
To see if its working properly
sed '$ s/.$/]/' file_name
To replace the last character of last line i.e comma in your case with ']' and change the file.
sed -i '$ s/.$/]/' file_name
To run from within python
import os
print os.system("sed -i '$ s/.$/]/' file_name")
As suggested by #Chris, accumulate all the new rows in a list then write all those row once. Then you won't have that pesky hanging comma.
......
rows = []
for row in reader:
newRow = {}
for key, value in row.items():
if key == 'stars_count' or key == 'reviews_count':
newRow[key] = float(value)
else:
newRow[key] = value
rows.append(newRow)
textFile.write(json.dumps(rows))
Related
I have a dict with a few {key, value} pairs. I also have a file with some content.
The file is something like this:
some random text
...
...
text-which-matches-a-key
some important lines
...
...
some other important text until end of file
What I want is, to search/iterate through the file until a line matches a key of the dict, the append the corresponding value before/after some important lines
What I've tried to do is this:
with open('file', 'a+') as f:
for key in a:
if f.readlines() == key:
f.writelines(a[key])
f.close()
where a is a dict, with many key,value pairs.
I'd be happy if the results are something like:
some random text
...
...
text-which-matches-a-key
some important lines
value corresponding to the key
...
...
some other important text until end of file
or:
some random text
...
...
text-which-matches-a-key
value corresponding to the key
some important lines
...
...
some other important text until end of file
Any help is appreciated.
P.S: Using Python 2.7, on PyCharm, on Windows 10.
The script below can not insert multiple dictionary values. The value of the last dictionary key that appears before 'some important lines' in the file is inserted.
dictionary = {'text-which-matches-a-key': 'value corresponding to the key'}
# Open file and fill a list in which each element is a line.
f = open('file', 'r')
lines = f.readlines()
f.close()
# Empty the file.
f = open('file', 'w')
f.close()
# Insert dictionary value at the right place.
for index, line in enumerate(lines):
'''Remove the newline character '\n'
to the right of the strings in the file.
The lines don't match dictionary keys if
the dictionary keys don't have newlines
appended.'''
line = line.rstrip()
# Check if any line is a key in the dictionary.
if line in dictionary.keys():
key_occurs_in_text = True
key_occurring_in_text = line
''' 'some important lines' is reached and a key
of the dictionary has appeared as a line in the
file. Save the list index which corresponds to
the line after or before 'some important lines' in
the variable insert_index. '''
if 'some important lines' == line and key_occurs_in_text:
insert_index = index + 1
# insert_index = index - 1
'''A line in the file
is a key in the dictionary.
Insert the value of the key at the index we saved
in insert_index.
Prepend and append newline characters to match
the file format.'''
if key_occurs_in_text:
lines.insert(insert_index, '\n'+dictionary[key_occurring_in_text]+'\n')
# Write the changed file content to the empty file.
f = open('file', 'w')
for line in lines:
f.write(line)
f.close
Your second version, i.e. inserting directly after the key line is quite simple.
If you don't mind loading the whole file into memory it's just
with open('file', 'r') as f:
txt = f.readlines()
with open('file', 'w') as f:
for line in txt:
f.write(line)
if line.strip() in block:
f.write(block[line.strip()])
with block being your dictionary.
However, if you do not want to load the whole file at once, you have to write to a different file than your source file, because inserting into files (in opposite to overwriting portions of a file) is not possible:
with open('source_file', 'r') as fs, open(target_file,'w') as ft:
for line in fs:
ft.write(line)
if line.strip() in block:
ft.write(block[line.strip()])
Of course it would be possible to e.g. rename the source file first and then write everything to the original filename.
Regarding the first version, i.e. leaving several important lines after the key and insert the block after these lines, well, that would require a proper definition of how to decide which or how many lines are important.
However, if it's just about a fix number of lines N after the key line:
with open(file, 'r') as f:
txt = f.readlines()
with open(file, 'w') as f:
N = 3
i = -1
for line in txt:
f.write(line)
if i == N:
f.write(block[key])
i = -1
if line.strip()in block:
key = line.strip()
i = 0
if i >= 0:
i += 1
... or without loading all at once into memory:
with open('source_file', 'r') as fs, open(target_file,'w') as ft:
N = 3
i = -1
for line in fs:
ft.write(line)
if i == N:
ft.write(block[key])
i = -1
if line.strip()in block:
key = line.strip()
i = 0
if i >= 0:
i += 1
There's a difference between readline() and readlines() (the former reads 1 line, the latter reads all lines and returns a list of strings).
See: https://docs.python.org/2.7/tutorial/inputoutput.html
It'd be easier to just read the entire file, apply your changes to it, and write it back to a file once you're done, rather than trying to edit the file inplace.
See: Editing specific line in text file in python
You don't have to manually close the file when you're useing the with-statement. The file will automatically close when you leave the with-block.
a+ means read and append, what you want is r+ which writes a new line at the cursor location.
Try this:
import fileinput
file = fileinput.FileInput(your_text_file, inplace=True)
for line in file:
for key, value in yourdictionary.iteritems():
line.replace(key, key + '\n' + value)
file.close()
After trying a few things mentioned here, and tinkering around with files and dictionaries, I finally came up with this snippet that works for me:
with open("input") as f:
data_file = f.read()
f.close()
data = data_file.splitlines()
f = open('output', 'w')
for line in data:
if line.strip() in b.keys():
line = line.strip() + '\n' + b[line.strip()].rstrip() + '\n'
f.writelines(line)
else:
f.writelines(line + '\n')
f.close()
where data is the content of the original file, and b is my dictionary of keys and values.
I don't know if answering my own question is allowed or not, but it got me the right output, hence posting it anyway.
I have a simple question. I am using this code to store dictionary of dictionaries in csv file.
data = dataA,dataB, dataC, dataD
w = csv.writer(open("test.csv", "w"))
for x in data:
for field, possible_values in x.items():
print(field, possible_values)
w.writerow([field, possible_values])
The stored values which I got in csv are stored in rows but I want them to be stored as column.
My actual result in csv:
name: Alex
Old: 22
My target in csv is should be like this:
Name Old
Alex 22
How can I change it?
Update1:
the clue is for key in x.keys(). After many hours of hard work i updated my code like that and it works better than before but i still have a small issue to get a new line at the end of storing all values and keys x.keys() and y.values(). of my dictioanries in csv file
if not os.path.isfile(filename):
outfile = open(filename, "w")
#outfile.write("#Sequence,,,,")
for x in data:
print(x.keys())
for key in x.keys():
print(key)
store_key= key + ","
outfile = open(filename, "a")
outfile.write(store_key)
outfile.close()
for y in data:
print(y.values())
for value in y.values():
print(value)
store_value = value + ","
outfile = open(filename, "a")
outfile.write(store_value)
outfile.close
Now i need to seperate maybe with "\n" between keys and values to get the the values od arays under the line of keys.
Any help yill be appreciated.
If you try to access and write to file, this will slow down your code. Hold the output text in a string, delete last comma(,) and add a newline character to end of the string. It will speed up your code, and new line will be added. You can do same for the second for loop too.
line = ""
for x in data:
for key in x.keys():
line += key + ","
line = line[:-1]+"\n" # Delete last character and add new line
outfile = open(filename, "a")
outfile.write(line)
outfile.close()
You can write a row of the dictionary keys, then a row of the values.
for x in data:
w.writerow(x.keys())
w.writerow(x.values())
This is my current code, the current issue I have is that search returns nothing. How do I achieve a string value for this variable.
count = 0
with open("userDatabase.csv","r") as myFile:
with open("newFile.csv","w") as newFile:
row_count = sum(1 for row in myFile)
print("aba")
for x in range(row_count):
print("aaa")
for row in myFile:
search = row[count].readline
print(search)
if self.delName.get("1.0","end-1c") in search:
count = count + 1
else:
newFile.write(row[count])
count = count + 1
The output is:
aba
aaa
aaa
So it runs through it twice, which is good as my userDatabase consists of two rows of data.
The file in question has this data:
"lukefinney","0000000","0000000","a"
"nictaylor","0000000","0000000","a"
You cannot just iterate over an open file more than once without rewinding the file object back to the start.
You'll need to add a file.seek(0) call to put the file reader back to the beginning each time you want to start reading from the first row again:
myFile.seek(0)
for row in myFile:
The rest of your code makes little sense; when iterating over a file you get individual lines from the file, so each row is a string object. Indexing into strings gives you new strings with just one character in it; 'foo'[1] is the character 'o', for example.
If you wanted to copy across rows that don't match a string, you don't need to know the row count up front at all. You are not handling a list of rows here, you can look at each row individually instead:
filter_string = self.delName.get("1.0","end-1c")
with open("userDatabase.csv","r") as myFile:
with open("newFile.csv","w") as newFile:
for row in myFile:
if filter_string not in row:
newFile.write(row)
This does a sub-string match. If you need to match whole columns, use the csv module to give you individual columns to match against. The module handles the quotes around column values:
import csv
filter_string = self.delName.get("1.0","end-1c")
with open("userDatabase.csv", "r", newline='') as myFile:
with open("newFile.csv", "w", newline='') as newFile:
writer = csv.writer(newFile)
for row in csv.reader(myFile):
# row is now a list of strings, like ['lukefinney', '0000000', '0000000', 'a']
if filter_string != row[0]: # test against the first column
# copied across if the first column does not match exactly.
writer.writerow(row)
One problem is that row_count = sum(1 for row in myFile) consumes all rows from myFile. Subsequent reads on myFile will return an empty string which signifies end of file. This means that for loop later in your code where you execute for row in myFile: is not entered because all rows have already been consumed.
A way around this is to add a call to myFile.seek(0) just before for row in myFile:. This will reset the file pointer and the for loop should then work.
It's not very clear from your code what it is that you are trying to do, but it kind of looks like you want to filter out rows that contain a certain string. Try this:
with open("userDatabase.csv","r") as myFile:
with open("newFile.csv","w") as newFile:
for row in myFile:
if self.delName.get("1.0","end-1c") not in row:
newFile.write(row)
I want to create a csv from an existing csv, by splitting its rows.
Input csv:
A,R,T,11,12,13,14,15,21,22,23,24,25
Output csv:
A,R,T,11,12,13,14,15
A,R,T,21,22,23,24,25
So far my code looks like:
def update_csv(name):
#load csv file
file_ = open(name, 'rb')
#init first values
current_a = ""
current_r = ""
current_first_time = ""
file_content = csv.reader(file_)
#LOOP
for row in file_content:
current_a = row[0]
current_r = row[1]
current_first_time = row[2]
i = 2
#Write row to new csv
with open("updated_"+name, 'wb') as f:
writer = csv.writer(f)
writer.writerow((current_a,
current_r,
current_first_time,
",".join((row[x] for x in range(i+1,i+5)))
))
#do only one row, for debug purposes
return
But the row contains double quotes that I can't get rid of:
A002,R051,02-00-00,"05-21-11,00:00:00,REGULAR,003169391"
I've tried to use writer = csv.writer(f,quoting=csv.QUOTE_NONE) and got a _csv.Error: need to escape, but no escapechar set.
What is the correct approach to delete those quotes?
I think you could simplify the logic to split each row into two using something along these lines:
def update_csv(name):
with open(name, 'rb') as file_:
with open("updated_"+name, 'wb') as f:
writer = csv.writer(f)
# read one row from input csv
for row in csv.reader(file_):
# write 2 rows to new csv
writer.writerow(row[:8])
writer.writerow(row[:3] + row[8:])
writer.writerow is expecting an iterable such that it can write each item within the iterable as one item, separate by the appropriate delimiter, into the file. So:
writer.writerow([1, 2, 3])
would write "1,2,3\n" to the file.
Your call provides it with an iterable, one of whose items is a string that already contains the delimiter. It therefore needs some way to either escape the delimiter or a way to quote out that item. For example,
write.writerow([1, '2,3'])
Doesn't just give "1,2,3\n", but e.g. '1,"2,3"\n' - the string counts as one item in the output.
Therefore if you want to not have quotes in the output, you need to provide an escape character (e.g. '/') to mark the delimiters that shouldn't be counted as such (giving something like "1,2/,3\n").
However, I think what you actually want to do is include all of those elements as separate items. Don't ",".join(...) them yourself, try:
writer.writerow((current_a, current_r,
current_first_time, *row[i+2:i+5]))
to provide the relevant items from row as separate items in the tuple.
I have similar problem to this guy: find position of a substring in a string
The difference is that I don't know what my "mystr" is. I know my substring but my string in the input file could be random amount of words in any order, but i know one of those words include substring cola.
For example a csv file: fanta,coca_cola,sprite in any order.
If my substring is "cola", then how can I make a code that says
mystr.find('cola')
or
match = re.search(r"[^a-zA-Z](cola)[^a-zA-Z]", mystr)
or
if "cola" in mystr
When I don't know what my "mystr" is?
this is my code:
import csv
with open('first.csv', 'rb') as fp_in, open('second.csv', 'wb') as fp_out:
reader = csv.DictReader(fp_in)
rows = [row for row in reader]
writer = csv.writer(fp_out, delimiter = ',')
writer.writerow(["new_cola"])
def headers1(name):
if "cola" in name:
return row.get("cola")
for row in rows:
writer.writerow([headers1("cola")])
and the first.csv:
fanta,cocacola,banana
0,1,0
1,2,1
so it prints out
new_cola
""
""
when it should print out
new_cola
1
2
Here is a working example:
import csv
with open("first.csv", "rb") as fp_in, open("second.csv", "wb") as fp_out:
reader = csv.DictReader(fp_in)
writer = csv.writer(fp_out, delimiter = ",")
writer.writerow(["new_cola"])
def filter_cola(row):
for k,v in row.iteritems():
if "cola" in k:
yield v
for row in reader:
writer.writerow(list(filter_cola(row)))
Notes:
rows = [row for row in reader] is unnecessary and inefficient (here you convert a generator to list which consumes a lot of memory for huge data)
instead of return row.get("cola") you meant return row.get(name)
in the statement return row.get("cola") you access a variable outside of the current scope
you can also use the unix tool cut. For example:
cut -d "," -f 2 < first.csv > second.csv