I'm trying to write the result of a function in a csv. Unfortunately, no pandas.
csv file input:
Hello all well?
today is cold!
I have not had lunch yet
He does not have many brothers or sisters.
We are sick
Script:
import re
import csv
import string
with open('teste_csv.csv', 'r') as f:
file = csv.reader(f)
for line in file:
message = ''.join(line)
def toto(message):
message = message.lower()
p = re.compile('|'.join(map(re.escape, string.punctuation)))
no_punct = p.sub(' ', message)
writer = csv.writer(open('result.csv', 'w'))
for row in no_punct:
writer.writerow(row)
return writer
print(toto(message))
At my terminal, I have <_csv.writer object at 0x7fee60e57c50> and in my result.csv I have only one line written 'w'. I would like each line to be in my result.csv
You keep erasing the file since everytime you call toto it opens result.csv for writing, hence you are left only with a single write. You need to open the file once ,and create the wirter once. You also only need to define the function once for that matter:
import re
import csv
import string
def toto(message,writer):
message = message.lower()
p = re.compile('|'.join(map(re.escape, string.punctuation)))
no_punct = p.sub(' ', message)
for row in no_punct:
writer.writerow(row)
with open('teste_csv.csv', 'r') as f:
writer = csv.writer(open('result.csv','w'))
file = csv.reader(f)
for line in file:
message = ''.join(line)
toto(message,writer)
You need to put the writer outside of your first loop. each time you are looping throw it's opening and rewriting the file
another issue you are defining and calling the toto inside the loop so it's getting called with last message value.
import re
import csv
import string
with open('test.csv', 'r') as f:
file = csv.reader(f)
writer = csv.writer(open('result.csv', 'w'))
def toto(message):
message = message.lower()
p = re.compile('|'.join(map(re.escape, string.punctuation)))
no_punct = p.sub(' ', message)
for row in no_punct:
writer.writerow(row)
return writer
for line in file:
print line
message=''.join(line)
print(toto(message))
Related
My program takes a csv file as input and writes it as an output file in json format. On the final line, I use the print command to output the contents of the json format file to the screen. However, it does not print out the json file contents and I don't understand why.
Here is my code that I have so far:
import csv
import json
def jsonformat(infile,outfile):
contents = {}
csvfile = open(infile, 'r')
reader = csvfile.read()
for m in reader:
key = m['No']
contents[key] = m
jsonfile = open(outfile, 'w')
jsonfile.write(json.dumps(contents))
csvfile.close()
jsonfile.close()
return jsonfile
infile = 'orders.csv'
outfile = 'orders.json'
output = jsonformat(infile,outfile)
print(output)
Your function returns the jsonfile variable, which is a file.
Try adding this:
jsonfile.close()
with open(outfile, 'r') as file:
return file.read()
Your function returns a file handle to the file jsonfile that you then print. Instead, return the contents that you wrote to that file. Since you opened the file in w mode, any previous contents are removed before writing the new contents, so the contents of your file are going to be whatever you just wrote to it.
In your function, do:
def jsonformat(infile,outfile):
...
# Instead of this:
# jsonfile.write(json.dumps(contents))
# do this:
json_contents = json.dumps(contents, indent=4) # indent=4 to pretty-print
jsonfile.write(json_contents)
...
return json_contents
Aside from that, you aren't reading the CSV file the correct way. If your file has a header, you can use csv.DictReader to read each row as a dictionary. Then, you'll be able to use for m in reader: key = m['No']. Change reader = csvfile.read() to reader = csv.DictReader(csvfile)
As of now, reader is a string that contains all the contents of your file. for m in reader makes m each character in this string, and you cannot access the "No" key on a character.
a_file = open("sample.json", "r")
a_json = json.load(a_file)
pretty_json = json.dumps(a_json, indent=4)
a_file.close()
print(pretty_json)
Using this sample to print the contents of your json file. Have a good day.
I have a text file with names and results. If the name already exists, only the result should be updated. I tried with this code and many others, but without success.
The content of the text file looks like this:
Ann, 200
Buddy, 10
Mark, 180
Luis, 100
PS: I started 2 weeks ago, so don't judge my bad code.
from os import rename
def updatescore(username, score):
file = open("mynewscores.txt", "r")
new_file = open("mynewscores2.txt", "w")
for line in file:
if username in line:
splitted = line.split(",")
splitted[1] = score
joined = "".join(splitted)
new_file.write(joined)
new_file.write(line)
file.close()
new_file.close()
maks = updatescore("Buddy", "200")
print(maks)
I would suggest reading the csv in as a dictionary and just update the one value.
import csv
d = {}
with open('test.txt', newline='') as f:
reader = csv.reader(f)
for row in reader:
key,value = row
d[key] = value
d['Buddy'] = 200
with open('test2.txt','w', newline='') as f:
writer = csv.writer(f)
for key, value in d.items():
writer.writerow([key,value])
So what needed to be different mostly is that when in your for loop you said to put line in the new text file, but it's never said to Not do that when wanting to replace a score, all that was needed was an else statement below the if statement:
from os import rename
def updatescore(username, score):
file = open("mynewscores.txt", "r")
new_file = open("mynewscores2.txt", "w")
for line in file:
if username in line:
splitted = line.split(",")
splitted[1] = score
print (splitted)
joined = ", ".join(splitted)
print(joined)
new_file.write(joined+'\n')
else:
new_file.write(line)
file.close()
new_file.close()
maks = updatescore("Buddy", "200")
print(maks)
You can try this, add the username if it doesn't exist, else update it.
def updatescore(username, score):
with open("mynewscores.txt", "r+") as file:
line = file.readline()
while line:
if username in line:
file.seek(file.tell() - len(line))
file.write(f"{username}, {score}")
return
line = file.readline()
file.write(f"\n{username}, {score}")
maks = updatescore("Buddy", "300")
maks = updatescore("Mario", "50")
You have new_file.write(joined) inside the if block, which is good, but you also have new_file.write(line) outside the if block.
Outside the if block, it's putting both the original and fixed lines into the file, and since you're using write() instead of writelines() both versions get put on the same line: there's no \n newline character.
You also want to add the comma: joined = ','.join(splitted) since you took the commas out when you used line.split(',')
I got the result you seem to be expecting when I put in both these fixes.
Next time you should include what you are expecting for output and what you're giving as input. It might be helpful if you also include what Error or result you actually got.
Welcome to Python BTW
Removed issues from your code:
def updatescore(username, score):
file = open("mynewscores.txt", "r")
new_file = open("mynewscores2.txt", "w")
for line in file.readlines():
splitted = line.split(",")
if username == splitted[0].strip():
splitted[1] = str(score)
joined = ",".join(splitted)
new_file.write(joined)
else:
new_file.write(line)
file.close()
new_file.close()
I believe this is the simplest/most straightforward way of doing things.
Code:
import csv
def update_score(name: str, score: int) -> None:
with open('../resources/name_data.csv', newline='') as file_obj:
reader = csv.reader(file_obj)
data_dict = dict(curr_row for curr_row in reader)
data_dict[name] = score
with open('../out/name_data_out.csv', 'w', newline='') as file_obj:
writer = csv.writer(file_obj)
writer.writerows(data_dict.items())
update_score('Buddy', 200)
Input file:
Ann,200
Buddy,10
Mark,180
Luis,100
Output file:
Ann,200
Buddy,200
Mark,180
Luis,100
I have this sample_vsdt.txt file containing SHA-1 and description like this inside of my txt file:
Scanning samples_extracted\02b809d4edee752d9286677ea30e8a76114aa324->(Microsoft RTF 6008-0)
->Found Virus [Possible_SCRDL]
Scanning samples_extracted\0349e0101d8458b6d05860fbee2b4a6d7fa2038d->(Adobe Portable Document Format(PDF) 6015-0)
->Found Virus [TROJ_FRS.VSN11I18]
Example:
SHA-1: 02b809d4edee752d9286677ea30e8a76114aa324
Description:(Microsoft RTF 6008-0)
Problem:
My task is to list those SHA-1 and Description in my txt file then list it in a csv file, I was able to do that using regex,prefix and delimeter. However this example is what makes it hard for me:
Scanning samples_extracted\0191a23ee122bdb0c69008971e365ec530bf03f5
- Invoice_No_94497.doc->Found Virus [Trojan.4FEC5F36]->(MIME 6010-0)
- Found 1/3 Viruses in samples_extracted\0191a23ee122bdb0c69008971e365ec530bf03f5
It has different line pattern and I only want to get the SHA-1 in the first line not the 4th line and get the description in the second line.
Output:
The output went wrong because the description (MIME 6010-0) was put in the SHA-1 column.
0191a23ee122bdb0c69008971e365ec530bf03f5
(MIME 6010-0)
02b809d4edee752d9286677ea30e8a76114aa324 (Microsoft RTF 6008-0)
0349e0101d8458b6d05860fbee2b4a6d7fa2038d (Adobe Portable Document Format(PDF) 6015-0)
035a7afca8b72cf1c05f6062814836ee31091559 (Adobe Portable Document Format(PDF) 6015-0)
Code
import csv
import re
INPUTFILE = 'samples_vsdt.txt'
OUTPUTFILE = 'output.csv'
PREFIX = '\\'
DELIMITER = '->'
DELIMITER2 = ']->'
PREFIX2 = ' - '
def read_text_file(inputfile):
data = []
with open(inputfile, 'r') as f:
lines = f.readlines()
for line in lines:
line = line.rstrip('\n')
if re.search(r'[a-zA-Z0-9]{40}', line) and not "Found" in line: # <----
line = line.split(PREFIX, 1)[-1]
parts = line.split(DELIMITER)
data.append(parts)
else:
if "->(" in line and "Found" in line :
matched_words=(re.search(r'\(.*?\)',line))
sha =(re.search(r'[a-zA-Z0-9]{40}',line))
if matched_words!=None:
matched_words=matched_words.group()
matched_words=matched_words.split("]->")
data.append(matched_words)
#data.append(parts)
return data
def write_csv_file(data, outputfile):
with open(outputfile, 'wb') as csvfile:
csvwriter = csv.writer(csvfile, delimiter=',', quotechar='"')
for row in data:
csvwriter.writerow(row)
def main():
data = read_text_file(INPUTFILE)
write_csv_file(data, OUTPUTFILE)
if __name__ == '__main__':
main()
Here is the full content of my text file:
sample_vsdt.txt
I changed some logic, maybe I can give you some different ideas.
Basically it checks if the string Scanning samples_extracted is present with (, which means that the description is on the same line of the sha.
Otherwise with only Scanning samples_extracted means that the desc is on the following line ( in your example there are some blank line, I had to add a while cycle )
Prints the result, cherry-pick logic and put in your program.
import re
with open("vCjjGQxe.txt") as f:
for line in f:
if "Scanning samples_extracted" in line and "(" in line:
sha = re.search('\\\(.*)->', line).group(1)
desc = re.search('->\((.*)\)', line).group(1)
print("SHA-1:", sha)
print("Description:", desc)
continue
if "Scanning samples_extracted" in line:
sha = re.search('\\\(.*)$', line).group(1)
while True:
i = next(f)
if "(" in i:
desc = re.search('->\((.*)\)', i).group(1)
break
print("SHA-1:", sha)
print("Description:", desc)
I am able to change the data to lowercase and remove all the punctuation but I have trouble saving the corrected data in CSV file.
import csv
import re
import os
input_file=raw_input("Name of the CSV file:")
output_file=raw_input("Output Name:")
reg_test=input_file
result = ''
with open(input_file,'r') as csvfile:
with open(output_file,'w') as csv_out_file:
filereader = csv.reader(csvfile)
filewriter =csv.writer(csv_out_file)
for row in filereader:
row = re.sub('[^A-Za-z0-9]+', '', str(row))
result += row + ','
lower = (result).lower()
csvfile.close()
csv_out_file.close()
You do not have to close the files, this is done automatically after the context of the with statement is over and you have to actually write something after you create the csv.writer, e.g. with writerow:
import csv
import re
input_file = 'in.csv'
output_file = 'out.csv'
with open(input_file, 'r') as csvfile, open(output_file, 'w') as csv_out_file:
filereader = csv.reader(csvfile)
filewriter = csv.writer(csv_out_file)
for row in filereader:
new_row = re.sub('[^A-Za-z0-9]+', '', str(row)) # manipulate the row
filewriter.writerow([new_row.lower()]) # write the new row to the out file
# the files are closed automatically after the context of the with statement is over
This saves the manipulated content of the first csv file to the second.
I have text that is key-value pairs separated by '='. I would like to replace the line if the key matches. if not, i would like to append it at the bottom. I've tried several ways, including:
def split_command_key_and_value(command):
if '=' in command:
command2 = command.split('=')
return command2
def test(command, path):
command2 = split_command_key_and_value(command)
pattern = command2[0]
myfile = open(path,'r') # open file handle for read
# use r'', you don't need to replace '\' with '/'
result = open(path, 'w') # open file handle for write
for line in myfile:
line = line.strip() # it's always a good behave to strip what you read from files
if pattern in line:
line = command # if match, replace line
result.write(line) # write every line
myfile.close() # don't forget to close file handle
result.close()
I know the above is just to replace text, but it deletes the text in the file, and I can't see why. Could someone point me in the right direction?
Thanks
Update:
I'm almost there, but some of my lines have similar keys, so mutiple lines are matching when only 1 should. I've tried to incorporate a regex boundary in my loop with no luck. My code is below. Does anyone have a suggestion?
There is some text in the file that isn't key-value, so I would like to skip that.
def modify(self, name, value):
comb = name + ' ' + '=' + ' ' + value + '\n'
with open('/file/', 'w') as tmpstream:
with open('/file/', 'r') as stream:
for line in stream:
if setting_name in line:
tmpstream.write(comb)
else:
tmpstream.write(line)
I think I got it. See code below.
def modify(self, name, value):
comb = name + ' ' + '=' + ' ' + value + '\n'
mylist = []
with open('/file/', 'w') as tmpstream:
with open('/file/', 'r') as stream:
for line in stream:
a = line.split()
b = re.compile('\\b'+name+'\\b')
if len(a) > 0:
if b.search(a[0]):
tmpstream.write(comb)
else:
tmpstream.write(line)
I spoke too soon. It stops at the key-value I provide. So, it only writes one line, and doesn't write the lines that don't match.
def modify(name, value):
comb = name + ' ' + '=' + ' ' + value + '\n'
mylist = []
with open('/file1', 'w') as tmpstream:
with open('/file2', 'r') as stream:
for line in stream:
a = line.split()
b = re.compile('\\b'+name+'\\b')
if len(a) > 0:
if b.search(a[0]):
tmpstream.write(comb)
else:
tmpstream.write(line)
Can anyone see the issue?
Because when you open file for writing
result = open(path, 'w') # open file handle for write
you just erase it content. Try to write in different file and after all work done replace old file with new one. Or read all data into memory and then process it and write to file.
with open(path) as f:
data = f.read()
with open(path, 'w') as f:
for l in data:
# make job here
first of all you are reading an writing the same file ...
you could first read it all and the write line by line
with open(path,'r') as f:
myfile = f.read() # read everything in the variable "myfile"
result = open(path, 'w') # open file handle for write
for line in myfile.splitlines(): # process the original file content 1 line at a time
# as before
I strongly recommend reading python's documentation on how to read and write files.
If you open an existing file in write-mode open(path, 'w'), its content will be erased:
mode can be (...) 'w' for only writing (an existing file with the same name will be erased)
To replace a line in python you can have a look at this: Search and replace a line in a file in Python
Here is one the solutions provided there adapted to your context (tested for python3):
from tempfile import mkstemp
from shutil import move
from os import close
def test(filepath, command):
# Split command into key/value
key, _ = command.split('=')
matched_key = False
# Create a temporary file
fh, tmp_absolute_path = mkstemp()
with open(tmp_absolute_path, 'w') as tmp_stream:
with open(filepath, 'r') as stream:
for line in stream:
if key in line:
matched_key = True
tmp_stream.write(command + '\n')
else:
tmp_stream.write(line)
if not matched_key:
tmp_stream.write(command + '\n')
close(fh)
move(tmp_absolute_path, filepath)
Note that with the code above every line that matches key (key=blob or blob=key) will be replaced.