Splitting CSV into files - python

I am looking for some guidance with what I am trying to do.
I have a .csv file, and in this file I want to break down each line and save it into its own text file.
I have that part working, however, when it runs I am losing the commas. I am assuming this is happening because I am converting a .csv file into a list then into a text file.
I feel there has to be a better way!
Code
def createParam():
with open('testcsv.csv', 'r') as f:
reader = csv.reader(f)
csvList = list(reader)
for item in csvList:
os.mkdir(r"C:\Users\user\Desktop\Test Path\\" + item[0])
f=open(r"C:\Users\user\Desktop\Test Path\\" + item[0] + r"\prm.263","w+")
f.writelines(item)
f.close
CSV
Store1,1080,SafehavenHumaneSociety,2904,LuckyPaws,3156,StMartinsDogRescue,4051,SalemFriendsofFelines,4088,HeartlandHumaneSociety,4118,Fortheloveofacat,6329,PeacefulPack,7710,OneVoice4Paws,7981,KeithasKittieRescue,7984,InternationalReptileRescueInc,9304,SeniorDogRescueOfOregon,9309,LovedAgainPets
Store2,0028,ArizonaAnimalWelfareLeague,0039,HelpingAnimalsLiveOnHALO,1468,MaricopaCountyAnimalCareandControlMCACC,4250,BuckeyeAnimalRescueKennel,5112,MASH,5957,FeathersFoundationInc,6725,ValleyHumaneSociety,7172,KitKatRescue,7627,LuckyDogRscu,7761,AZSmallDog,8114,WhoSavedWhoRescue,9160,DestinationHome,9248,AllAboutAnimals
Clarification: When it creates the file(s), it has all the data, but all the commas are removed so its just all 1 long line.

Since each item is a list of values representing a row in the CSV, you should write it as a CSV with csv.writer:
for item in csvList:
os.mkdir(r"C:\Users\user\Desktop\Test Path\\" + item[0])
with open(r"C:\Users\user\Desktop\Test Path\\" + item[0] + r"\prm.263","w+") as f:
csv.writer(f).writerow(item[1:])

I guess you just need to load the file and read line by line (not loading it as a csv file). Each line goes to a file.
index = 0
with open('testcsv.csv', 'r') as f:
for line in f.readlines():
index += 1
with open('new_textfile_{}.csv'.format(index), 'w') as f2:
f2.write(line)
If you want to save the files in some directory X, then the path in the second with open... should be "X/whatever_name_{}.csv".format(index)

Related

Want to append a column in a file without using the Pandas

I have a file say, outfile.txt which looks like below:
1,2,3,4,0,0.95
1,2,4,4,0,0.81
5,6,3,1,0,0.89
7,6,8,8,0,0.77
6,6,4,9,0,0.88
9,9,9,1,0,0.66
4,3,6,9,0,0.85
1,2,6,7,0,0.61
Now I want to append one extra 1 to each row. So the desired output file looks like:
1,2,3,4,0,0.95,1
1,2,4,4,0,0.81,1
5,6,3,1,0,0.89,1
7,6,8,8,0,0.77,1
6,6,4,9,0,0.88,1
9,9,9,1,0,0.66,1
4,3,6,9,0,0.85,1
1,2,6,7,0,0.61,1
How can I do it? Whenever I am googling it to find a solution, I am seeing everywhere this kind of solution is provided using Pandas, But I don't want to use that.
Since your file is in csv format, csv module can help you. If you iterate over the reader object, it gives you a list of the items in each line in the file, then simply .append() what you want.
import csv
with open("outfile.txt") as f:
reader = csv.reader(f)
for line in reader:
line.append("1")
print(",".join(line))
If you have a column like column you can zip it with the reader object and append the corresponding element in the loop:
import csv
column = range(10)
with open("outfile.txt") as f:
reader = csv.reader(f)
for line, n in zip(reader, map(str, column)):
line.append(n)
print(",".join(line))
I printed, you can write it to a new file.
You can read and write files line by line with the csv module. A reader object will iterate the rows of the input file and writer.writerows will consume that iterator. You just need a bit of extra code to add the 1. Using a list generator, this example adds the extra column.
import csv
import os
filename = "outfile.txt"
tmp = filename + ".tmp"
with open(filename, newline="") as infile, open(tmp, "w", newline="") as outfile:
csv.writer(outfile).writerows(row + [1] for row in csv.reader(infile))
os.rename(tmp, filename)
Just, iterate through the file line by line and add ,1 at the end of each line:
with open('outfile.txt', 'r') as input:
with open('outfile_final.txt', 'w') as output:
for line in input:
line = line.rstrip('\n') + ',1'
print(line, file=output)

writing the data in text file while converting it to csv

I am very new with python. I have a .txt file and want to convert it to a .csv file with the format I was told but could not manage to accomplish. a hand can be useful for it. I am going to explain it with screenshots.
I have a txt file with the name of bip.txt. and the data inside of it is like this
I want to convert it to csv like this csv file
So far, what I could do is only writing all the data from text file with this code:
read_files = glob.glob("C:/Users/Emrehana1/Desktop/bip.txt")
with open("C:/Users/Emrehana1/Desktop/Test_Result_Report.csv", "w") as outfile:
for f in read_files:
with open(f, "r") as infile:
outfile.write(infile.read())
So is there a solution to convert it to a csv file in the format I desire? I hope I have explained it clearly.
There's no need to use the glob module if you only have one file and you already know its name. You can just open it. It would have been helpful to quote your data as text, since as an image someone wanting to help you can't just copy and paste your input data.
For each entry in the input file you will have to read multiple lines to collect together the information you need to create an entry in the output file.
One way is to loop over the lines of input until you find one that begins with "test:", then get the next line in the file using next() to create the entry:
The following code will produce the split you need - creating the csv file can be done with the standard library module, and is left as an exercise. I used a different file name, as you can see.
with open("/tmp/blip.txt") as f:
for line in f:
if line.startswith("test:"):
test_name = line.strip().split(None, 1)[1]
result = next(f)
if not result.startswith("outcome:"):
raise ValueError("Test name not followed by outcome for test "+test_name)
outcome = result.strip().split(None, 1)[1]
print test_name, outcome
You do not use the glob function to open a file, it searches for file names matching a pattern. you could open up the file bip.txt then read each line and put the value into an array then when all of the values have been found join them with a new line and a comma and write to a csv file, like this:
# set the csv column headers
values = [["test", "outcome"]]
current_row = []
with open("bip.txt", "r") as f:
for line in f:
# when a blank line is found, append the row
if line == "\n" and current_row != []:
values.append(current_row)
current_row = []
if ":" in line:
# get the value after the semicolon
value = line[line.index(":")+1:].strip()
current_row.append(value)
# append the final row to the list
values.append(current_row)
# join the columns with a comma and the rows with a new line
csv_result = ""
for row in values:
csv_result += ",".join(row) + "\n"
# output the csv data to a file
with open("Test_Result_Report.csv", "w") as f:
f.write(csv_result)

Appending to the end of a certain line

Rather than appending to the end of a file, I am trying to append to the end of a certain line of a .csv file.
I want to do this when the user enters an input that matches the first column of the .csv.
Here's an example:
file=open("class"+classno+".csv", "r+")
writer=csv.writer(file)
data=csv.reader(file)
for row in data:
if input == row[0]:
(APPEND variable TO ROW)
file.close()
Is there a way to do this? Would I have to redefine and then rewrite the file?
You can read the whole file then change what you need to change and write it back to file (it's not really writing back when it's complete overwriting).
Maybe this example will help:
read_data = []
with open('test.csv', 'r') as f:
for line in f:
read_data.append(line)
with open('test.csv', 'w') as f:
for line in read_data:
key,value = line.split(',')
new_line = line
if key == 'b':
value = value.strip() + 'added\n'
new_line = ','.join([key,value])
f.write(new_line)
My test.csv file at start:
key,value
a,1
b,2
c,3
d,4
And after I run that sample code:
key,value
a,1
b,2added
c,3
d,4
It's probably not the best solution with big files.

search contents of one file with contents of a second file using python

I have the following code which compares the items on the first column of input file1 with the contents of input file 2:
import os
newfile2=[]
outfile=open("outFile.txt","w")
infile1=open("infile1.txt", "r")
infile2=open("infile2.txt","r")
for file1 in infile1:
#print file1
file1=str(file1).strip().split("\t")
print file1[0]
for file2 in infile2:
if file2 == file1[0]:
outfile.write(file2.replace(file2,file1[1]))
else:
outfile.write(file2)
input file 1:
Modex_xxR_SL1344_3920 Modex_sseE_SL1344_3920
Modex_seA_hemN Modex_polA_SGR222_3950
Modex_GF2333_3962_SL1344_3966 Modex_ertd_wedS
input file 2:
Sardes_xxR_SL1344_4567
Modex_seA_hemN
MOdex_uui_gytI
Since the input file 1 item (column 1, row 2) matches an item in input file 2 (row 2), then the column 2 item in input file 1 replaces the input file 2 item in the output file as follows (required output):
Sardes_xxR_SL1344_4567
Modex_polA_SGR222_3950
MOdex_uui_gytI
So far my code is only outputting the items in input file 1. Can someone help modify this code. Thanks
Looks like you have a tsv file, so let's go ahead and treat it as such. We'll build a tsv reader csv.reader(fileobj, delimiter="\t") that will iterate through infile1 and build a translation dict from it. The dictionary will have keys of the first column and values of the second column per row.
Then using dict.get we can translate the line from infile2 if it exists in our translation dict, or just write the line itself if there's no translation available.
import csv
with open("infile1.txt", 'r') as infile1,\
open('infile2.txt', 'r') as infile2,\
open('outfile.txt', 'w') as outfile:
trans_dict = dict(csv.reader(infile1, delimiter="\t"))
for line in infile2:
outfile.write(trans_dict.get(line.strip(),line.strip()) + "\n")
Result:
# contents of outfile.txt
Sardes_xxR_SL1344_4567
Modex_polA_SGR222_3950
MOdex_uui_gytI
EDIT as per your comment:
import csv
with open("infile1.txt", 'r') as infile1:
# build our translation dict
trans_dict = dict(csv.reader(infile1, delimiter="\t"))
with open("infile2.txt", 'r') as infile2,\
open("outfile.txt", 'w') as outfile:
# open the file to translate and our output file
reader = csv.reader(infile2, delimiter="\t")
# treat our file to translate like a tsv file instead of flat text
for line in reader:
outfile.write("\t".join([trans_dict.get(col, col) for col in line] + "\n"))
# map each column from trans_dict, writing the whole row
# back re-tab-delimited with a trailing newline

How can I speed up this really basic python script for offsetting lines of numbers

I have a simple text file which contains numbers in ASCII text separated by spaces as per this example.
150604849
319865.301865 5810822.964432 -96.425797 -1610
319734.172256 5810916.074753 -52.490280 -122
319730.912949 5810918.098465 -61.864395 -171
319688.240891 5810889.851608 -0.339890 -1790
*<continues like this for millions of lines>*
basically I want to copy the first line as is, then for all following lines I want to offset the first value (x), offset the second value (y), leave the third value unchanged and offset and half the last number.
I've cobbled together the following code as a python learning experience (apologies if it crude and offensive, truly I mean no offence) and it works ok. However the input file I'm using it on is several GB in size and I'm wondering if there's ways to speed up the execution. Currently for a 740 MB file it takes 2 minutes 21 seconds
import glob
#offset values
offsetx = -306000
offsety = -5806000
files = glob.glob('*.pts')
for file in files:
currentFile = open(file, "r")
out = open(file[:-4]+"_RGB_moved.pts", "w")
firstline = str(currentFile.readline())
out.write(str(firstline.split()[0]))
while 1:
lines = currentFile.readlines(100000)
if not lines:
break
for line in lines:
out.write('\n')
words = line.split()
newwords = [str(float(words[0])+offsetx), str(float(words[1])+offsety), str(float(words[2])), str((int(words[3])+2050)/2)]
out.write(" ".join(newwords))
Many thanks
Don't use .readlines(). Use the file directly as an iterator:
for file in files:
with open(file, "r") as currentfile, open(file[:-4]+"_RGB_moved.pts", "w") as out:
firstline = next(currentFile)
out.write(firstline.split(None, 1)[0])
for line in currentfile:
out.write('\n')
words = line.split()
newwords = [str(float(words[0])+offsetx), str(float(words[1])+offsety), words[2], str((int(words[3]) + 2050) / 2)]
out.write(" ".join(newwords))
I also added a few Python best-practices, and you don't need to turn words[2] into a float, then back to a string again.
You could also look into using the csv module, it can handle splitting and rejoining lines in C code:
import csv
for file in files:
with open(file, "rb") as currentfile, open(file[:-4]+"_RGB_moved.pts", "wb") as out:
reader = csv.reader(currentfile, delimiter=' ', quoting=csv.QUOTE_NONE)
writer = csv.writer(out, delimiter=' ', quoting=csv.QUOTE_NONE)
out.writerow(next(reader)[0])
for row in reader:
newrow = [str(float(row[0])+offsetx), str(float(row[1])+offsety), row[2], str((int(row[3]) + 2050) / 2)]
out.writerow(newrow)
Use thé CSV package. It may be more optimized than your script and will simplify your code.

Categories