I'm trying to import the data from X (6 in this case) CSV Files containing some data about texts, and putting one specific row from each Document onto a new one, in such a way that they appear next to each other (export from document 1 on column 1, from the second document on row 2 and so on). I've been unsuccessful so far.
# I have a list containing the path to all relevant files
files = ["path1", "path2", ...]
# then I tried cycling through the folder like this
for file in files:
with open(file, "r") as csvfile:
reader = csv.reader(csvfile, delimiter=",")
for row on reader:
# I'm interested in the stuff stored on Column 2
print([row[2]])
# as you can see, I can get the info from the files, but from here
# on, I can't find a way to then write that information on the
# appropiate coloumn of the newly created CSV file
I know how to open a writer, what I don't know is how to write a script that writes the info it fetches from the original 6 documents on a DIFFERENT COLUMN every time a new file is processed.
# I have a list containing the path to all relevant files
files = ["path1", "path2", ...]
newfile = "newpath1"
# then I tried cycling through the folder like this
for file in files:
with open(file, "r") as csvfile:
reader = csv.reader(csvfile, delimiter=",")
with open(newfile, "a") as wcsvfile:
writer = csv.writer(wcsvfile)
for row on reader:
# I'm interested in the stuff stored on Column 2
writer.writerow([row[2]])
Related
I have 3 csv files that I need to merge
all the 3 files have the first three columns being equal while like name, age, sex but other columns are different for all.
I am new to python. I need assistance on this. I can comprehend any code written. Thanks
I have tried some codes but not working
file 1
firstname,secondname,age,address,postcode,height
gdsd,gas,uugd,gusa,uuh,hhuuw
kms,kkoil,jjka,kja,kaja,loj
iiow,uiuw,iue,oijw,uow,oiujw
ujis,oiiw,ywuq,sax,cxv,ywf
file 2
firstname,secondname,age,home-town,spousename,marital_staus
gdsd,gas,uugd,vbs,owu,nsvc
kms,kkoil,jjka,kja,kaja,loj
iiow,uiuw,iue,xxfaf,owuq,pler
ujis,oiiw,ywuq,gfhd,lzac,oqq
file 3
firstname,secondname,age,drive,educated,
gdsd,gas,uugd,no,yes
kms,kkoil,jjka,no,no
iiow,uiuw,iue,yes,no
ujis,oiiw,ywuq,yes,yes
desired result
firstname,secondname,age,hometown,spousename,marital_status,adress,post_code,height,drive,educated
note that firstname,secondname,age is the same across the 3 tables
I need valid codes please
Here's a generic solution for concatenating CSV files that have heterogeneous headers with Python.
What you need to do first is to read the header of each CSV file for determining the "unique" field names.
Then, you just have to read each input record and output it while transforming it to match the new header (which is the unique fields).
#!/usr/bin/env python3
import csv
paths = [ 'file1.csv', 'file2.csv', 'file3.csv' ]
fieldnames = set()
for p in paths:
with open(p,'r') as f:
reader = csv.reader(f)
fieldnames.update( next(reader) )
with open('combined.csv', 'w') as o:
writer = csv.DictWriter(o, fieldnames = fieldnames)
writer.writeheader()
for p in paths:
with open(p,'r') as f:
reader = csv.DictReader(f)
writer.writerows( reader )
remark: I open the files twice, so it won't work for inputs that are streams (for ex. sys.stdin)
I have a folder that contains 60 folders, each of which contains about 60 CSVs (and 1 or 2 non-CSVs).
I need to compare the header rows of all of these CSVs, so I am trying to go through the directories and write to an output CSV (1) the filepath of the file in question and (2) the header row in the subsequent cells in the row in the output CSV.
Then go to the next file, and write the same information in the next row of the output CSV.
I am lost in the part where I am writing the header rows to the CSV -- and am too lost to have even generated an error message.
Can anyone advise on what to do next?
import os
import sys
import csv
csvfile = '/Users/username/Documents/output.csv'
def main(args):
# Open a CSV for writing outputs to
with open(csvfile, 'w') as out:
writer = csv.writer(out, lineterminator='\n')
# Walk through the directory specified in cmd line
for root, dirs, files in os.walk(args):
for item in files:
# Check if the item is a CSV
if item.endswith('.csv'):
# If yes, read the first row
with open(item, newline='') as f:
reader = csv.reader(f)
row1 = next(reader)
# Write the first cell as the file name
f.write(os.path.realpath(item))
f.write(f.readline())
f.write('\n')
# Write this row to a new line in the csvfile var
# Go to next file
# If not a CSV, go to next file
else:
continue
# Write each file to the CSV
# writer.writerow([item])
if __name__ == '__main__':
main(sys.argv[1])
IIUC you need a new csv file with 2 columns: file_path and headers.
If the header that you need is just a list of column names from that csv, then it will be easier if you use a pandas dataframe to store these values first and then write the dataframe to a csv.
import pandas as pd
res = []
for root, dirs, files in os.walk(args):
for item in files:
# Check if the item is a CSV
if item.endswith('.csv'):
# If yes, read the first row
df = pd.read_csv(item)
row = {}
row['file_path'] = os.path.realpath(item)
row['headers'] = df.columns
res.append(row)
res_df = pd.DataFrame(res)
res_df.to_csv(csvfile)
You seem to be getting confused between which file you're reading and writing to. Confusion is normal when you try to do everything in one big function. The whole point of functions is to break things down so it's easy to follow, understand and debug.
Here is some code, which doesn't work, but you can easily print out what each function is returning, and once you know that's correct, you feed it to the next function. Each function is small, with very few variables, so not much can go wrong.
And most importantly, the variables in each function are local to it, meaning they cannot interfere with what's happening elsewhere, or even confuse you into thinking they might be interfering (and that makes a huge difference).
def collect_csv_data():
results = []
for root, dirs, files in os.walk(args):
for file in files:
if file.endswith('.csv'):
headers = extract_headers(os.path.join(root, file))
results.append((file, headers))
return results
def extract_headers(filepath):
with open(filepath) as f:
reader = csv.reader(f)
headers = reader.next()
return headers
def write_results(result, filepath):
with open(filepath, 'w') as f:
writer = csv.writer(f)
for result in results:
writer.writerow(result)
if __name__ == '__main__':
directory = sys.argv[1]
results = collect_csv_data(directory)
write_results(results, 'results.csv')
I would like to create a file in real time and add the values corresponding to the columns to an existing file in real time in the corresponding CSV file.
How can I add each of the CSV files that I generate in that program?
I'll write down the code I'm using now.
import csv
for i in range(10):
SD="Save datas(Angle)"+str(i) ## 해당 각도별로 배열을 지정
SDArray1=str(SD) ## 파일을 만들어준다
f=open(SDArray1+".csv","a+t")# ## 이름을 만들어준 파일을 생성
csv_writer = csv.writer(f)
csv_writer.writerow([SD])
print("One loop has started")
f.close()#
for i in range(1,5):
cdata=[i]
f=open(SDArray1+".csv","a+t")
csv_writer =csv.writer(f)
csv_writer.writerow(cdata)
print(cdata)
f.close()#
print("loop's finished!")
If you look at the code above, a certain file is created. I completed the next file, but I was wondering how to add columns to the file.
csv.write_row() takes a complete row of columns - if you need more, add them to your cdata=[i]- f.e. cdata=[i,i*2,i*3,i*4].
You should use with open() as f: for file manipulation, it is more resilient against errors and autocloses the file when leaving the with-block.
Fixed:
import csv
# do not use i here and down below, thats confusing, better names are a plus
for fileCount in range(10):
filename = "filename{}.csv".format(fileCount) # creates filename0.csv ... filename9.csv
with open(filename,"w") as f:# # create file new
csv_writer = csv.writer(f)
# write headers
csv_writer.writerow(["data1","data2","data3"])
# write 4 rows of data
for i in range(1,5):
cdata=[(fileCount*100000+i*1000+k) for k in range(3)] # create 3 datapoints
# write one row of data [1000,1001,1002] up to [9004000,9004001,9004002]
# for last i and fileCount
csv_writer.writerow(cdata)
# no file.close- leaving wiht open() scope autocloses
Check what we have written:
import os
for d in sorted(os.listdir("./")):
if d.endswith("csv"):
print(d,":")
print("*"*(len(d)+2))
with open(d,"r") as f:
print(f.read())
print("")
Output:
filename0.csv :
***************
data1,data2,data3
1000,1001,1002
2000,2001,2002
3000,3001,3002
4000,4001,4002
filename1.csv :
***************
data1,data2,data3
101000,101001,101002
102000,102001,102002
103000,103001,103002
104000,104001,104002
filename2.csv :
***************
data1,data2,data3
201000,201001,201002
[...snip the rest - you get the idea ...]
filename9.csv :
***************
data1,data2,data3
901000,901001,901002
902000,902001,902002
903000,903001,903002
904000,904001,904002
To add a new column to an existing file:
open old file to read
open new file to write
read the old files header, add new column header and write it in new file
read all rows, add new columns value to each row and write it in new file
Example:
Adding the sum of column values to the file and writing as new file:
filename = "filename0.csv"
newfile = "filename0new.csv"
# open one file to read, open other (new one) to write
with open(filename,"r") as r, open(newfile,"w") as w:
reader = csv.reader(r)
writer = csv.writer(w)
newHeader = next(reader) # read the header
newHeader.append("Sum") # append new column-header
writer.writerow(newHeader) # write header
# for each row:
for row in reader:
row.append(sum(map(int,row))) # read it, sum the converted int values
writer.writerow(row) # write it
# output the newly created file:
with open(newfile,"r") as n:
print(n.read())
Output:
data1,data2,data3,Sum
1000,1001,1002,3003
2000,2001,2002,6003
3000,3001,3002,9003
4000,4001,4002,12003
I am trying to slice the .lvm files at row 22 and convert them into csv files, I have around 50 .lvm files in my directory. So after each of them are cropped I need them to be merged together in the csv file.
But I am not able to iterate between my .lvm files.
path = "/mnt/b818255b-46bc-4a89-9ef3-42138e2ad25f/PASST 4.0/Messdaten/*.lvm"
for fname in glob.glob(path):
with open(fname, 'rb') as fin:
reader = csv.reader(fin)
for row in reader:
for field in row:
with open(fname) as f, open("out.csv","a") as out:
r = csv.reader(islice(f,22,None))
wr = csv.writer(out)
wr.writerows(r)
But the first file in the directory is always taken inside the for loop. Can anyone please help me understand the problem?
I have created a Python 2.7 script that does the following:
Gets a list of filenames from a folder, and writes them to a csv file, one for each row.
And
Enters data into a search box on the web.
Writes the result from the search box into another csv file.
So what I would like now, is for the csv data in (1 ) to act as the input for (2 ).
i.e. for each filename in the csv file, it conducts a search for that cell.
Additionally, instead of just writing the results into a second csv file in (3 ), I would like to append the result into the first csv file – OR generate a new one with both columns.
I can provide the code, but since it's 50 lines already, I've just tried to keep this question descriptive.
Update: Proposed retrieval and append:
with open("file.csv","a+") as f:
r = csv.reader(f)
wr = csv.writer(f, delimiter="\n")
result = []
for line in r:
searchbox = driver.find_element_by_name("searchbox")
searchbox.send_keys(line)
sleep(8)
search_reply = driver.find_element_by_class_name("search_reply")
result = re.findall("((?<=\()[0-9]*)", search_reply.text)
wr.writerow(result)
Open for reading and appending, store the output then write at the end:
import csv
with open("first.csv","a+") as f:
r = csv.reader(f)
wr = csv.writer(f,delimiter="\n")
result = []
for line in r:
# process lines/step 2
# append to result
wr.writerow(result)