Edited
I am new to Python, having a problem adding a loop to a nested loop Python code.
using Python 3.8 on my windows 7 machine.
The code does when run once: it reads from multiple CSV files, row by row, and CSV file by CSV file, and uses the data from each row ( within a given range)to run the function until there is no CSV file left, each CSV file has 4 columns, all CSV files have one header each.
There are a few seconds of delay between each row reading.
since the code is just for one-time use, when you run the code again, it reads the same rows, it does not loop to read other rows.
So I want to add another loop to it, so each time you run the file somehow it remembers the last row that was used and starts from the next row.
So assume it has been set to a range of 2 rows:
the first-time run: uses row 1 and 2 to run the function
second-time run: uses row 3 and 4 to run the function, and so on
Appreciate your help to make it work.
Example CSV
img_url,desc_1 title_1,link_1
site.com/image22.jpg;someTitle;description1;site1.com
site.com/image32.jpg;someTitle;description2;site2.com
site.com/image44.jpg;someTitle;description3;site3.com
Here is the working code I have:
from abc.zzz import xyz
path_id_map = [
{'path':'file1.csv', 'id': '12345678'},
{'path':'file2.csv', 'id': '44556677'}
{'path':'file3.csv', 'id': '33377799'}
{'path':'file4.csv', 'id': '66221144'}]
s_id = None
for pair in path_id_map:
with open(pair['path'], 'r') as f:
next(f) # skip first header line
for _ in range(1, 3):
line = next(f)
img_url, title_1, desc_1, link_1 = map(str.strip, line.split(';'))
zzz.func1(img_url=img_url, title_1=title_1, desc_1=desc_1,
link_1=link_1, B_id=B_id=pair['id'], s_id=s_id)
time.sleep(25)
**** Update ****
After a few days of looking for a solution, a Code has been posted( UPDATE 2):
but there is a major problem with it.
it works the way I want only when using the print function,
I adopted my function to it but, when it runs for a second time or more, it does not loop to the next rows, (it only does loop correctly on the last CSV file though),
the author of the code could not correct his code, I can not figure out what is wrong with it.
I checked the CSV files and tested them with the print function, they are OK.
perhaps someone helps to correct the problem or another solution altogether.
Hi I hope I have understood what you're asking. I think the below code might guide you if you adjust it a little bit for your case. You can store the number of the final line into a text file. I also assume that as a delimiter the semi-colon is used.
UPDATE 1:
Okay, I think I came up with this solution to your problem, hopefully. The only prerequisite to run this is to have a text file which includes the number of row you want to begin with for the first run (e.g. 1).
# define function
import csv
import time
import subprocess
import os
import itertools
# txt file that contains the number of line to start the next time
dir_txt = './'
fname_txt = 'number_of_last_line.txt'
path = os.path.join(dir_txt, fname_txt)
# assign line number to variable after reading relevant txt
with open(path, 'r', newline='') as f:
n = int(f.read())
# define path of csv file
fpath = './file1.csv'
# open csv file
with open(fpath, 'r', newline='') as csvfile:
csv_reader = csv.reader(csvfile, delimiter=';')
# Iterate every row of csv. csv_reader row number starts from 1,
# csv_reader generator starts from 0
for row in itertools.islice(csv_reader, n, n+3):
print('row {0} contains {1}'.format(csv_reader.line_num, row))
time.sleep(3)
# Store the number of line to start the next time
n = csv_reader.line_num + 1
# Bash (or cmd) command execution, option. You can do this with python also
sh_command = 'echo {0} > {1}'.format(csv_reader.line_num, path)
subprocess.run(sh_command, shell=True)
UPDATE 2:
Here's a revision with the code working for multiple files using the input of #Error - Syntactical Remorse. The first thing you need to do is open the metadata.json file and insert the number of row you want to begin each file, for the first run only. You also need to change the file directories according to your situation.
# define function
def get_json_metadata(json_fpath):
"""Read json file
Args:
json_fpath -- string (filepath)
Returns:
json_list -- list"""
with open(json_fpath, mode='r') as json_file:
json_str = json_file.read()
json_list = json.loads(json_str)
return json_list
# Imports
import csv, json
import time
import os
import itertools
# json file that contains the number of line to start the next time
dir_json = './'
fname_json = 'metadata.json'
json_fpath = os.path.join(dir_json, fname_json)
# csv filenames, IDs and number of row to start reading are extracted
path_id_map = get_json_metadata(json_fpath)
# iterate over csvfiles
for nfile in path_id_map:
print('\n------ Reading {} ------\n'.format(nfile['path']))
with open(nfile['path'], 'r', newline='') as csvfile:
csv_reader = csv.reader(csvfile, delimiter=';')
# Iterate every row of csv. csv_reader row number starts from 1,
# csv_reader generator starts from 0
for row in itertools.islice(csv_reader, nfile['nrow'], nfile['nrow']+5):
# skip empty line (list)
if not row:
continue
# assign values to variables
img_url, title_1, desc_1, link_1 = row
B_id = nfile['id']
print('row {0} contains {1}'.format(csv_reader.line_num, row))
time.sleep(3)
# Store the number of line to start the next time
nfile['nrow'] = csv_reader.line_num
with open(json_fpath, mode='w') as json_file:
json_str = json.dumps(path_id_map, indent=4)
json_file.write(json_str)
This is how the metadata.json format should be:
[
{
"path": "file1.csv",
"id": "12345678",
"nrow": 1
},
{
"path": "file2.csv",
"id": "44556677",
"nrow": 1
},
{
"path": "file3.csv",
"id": "33377799",
"nrow": 1
},
{
"path": "file4.csv",
"id": "66221144",
"nrow": 1
}
]
I'm trying to write a function that reads a sheet of an existing .csv file and every 20 rows are copied to a newly created csv file. Therefore, it needs to be designed like a file counter "file_01, file_02, file_04,...," where the first 20 rows are copied to file_01, the next 20 to file_02.csv, and so on.
Currently I have this code which hasn't worked for me work so far.
import csv
import os.path
from itertools import islice
N = 20
new_filename = ""
filename = ""
with open(filename, "rb") as file: # the a opens it in append mode
reader = csv.reader(file)
for i in range(N):
line = next(file).strip()
#print(line)
with open(new_filename, 'wb') as outfh:
writer = csv.writer(outfh)
writer.writerow(line)
writer.writerows(islice(reader, 2))
I have attached a file for testing.
https://1drv.ms/u/s!AhdJmaLEPcR8htYqFooEoYUwDzdZbg
32.01,18.42,58.98,33.02,55.37,63.25,12.82,-32.42,33.99,179.53,
41.11,33.94,67.85,57.61,59.23,94.69,19.43,-19.15,21.71,-161.13,
49.80,54.12,72.78,100.74,56.97,128.84,26.95,-6.76,10.07,-142.62,
55.49,81.02,68.93,148.17,49.25,157.32,34.94,5.39,0.44,-123.32,
56.01,112.81,59.27,177.87,38.50,179.63,43.43,18.42,-5.81,-102.24,
50.79,142.87,48.06,-162.32,26.60,-161.21,52.38,34.37,-7.42,-79.64,
41.54,167.36,37.12,-145.93,15.01,-142.84,60.90,57.05,-4.47,-56.54,
30.28,-172.09,27.36,-130.24,5.11,-123.66,66.24,91.12,-0.76,-35.44,
18.64,-153.20,19.52,-114.09,-1.54,-102.96,64.77,131.32,5.12,-21.68,
7.92,-134.07,14.24,-96.93,-3.79,-80.91,57.10,162.35,12.51,-9.21,
-0.34,-113.74,11.80,-78.73,-2.49,-58.46,46.75,-175.86,20.81,2.87,
-4.81,-91.85,11.78,-60.28,0.59,-39.26,35.75,-158.12,29.79,15.71,
-4.76,-68.67,13.79,-43.84,6.82,-24.69,25.27,-141.56,39.05,30.71,
-1.33,-46.42,18.44,-30.23,14.53,-11.95,16.21,-124.45,47.91,50.25,
4.14,-29.61,24.89,-18.02,23.01,0.10,9.59,-106.05,54.46,77.07,
11.04,-15.39,32.33,-6.66,31.92,12.48,6.24,-86.34,55.72,110.53,
18.69,-2.32,40.46,4.57,41.11,26.87,6.07,-65.68,50.25,142.78,
26.94,10.56,49.18,16.67,49.92,45.39,8.06,-46.86,40.13,168.29,
35.80,24.58,58.45,31.99,56.83,70.92,12.96,-31.90,28.10,-171.07,
44.90,41.72,67.41,55.89,59.21,103.94,19.63,-18.67,15.97,-152.40,
-5.41,-77.62,11.40,-63.21,4.80,-29.06,31.33,-151.44,43.00,37.25,
-2.88,-54.38,13.08,-46.00,12.16,-15.86,21.21,-134.62,51.25,59.16,
1.69,-35.73,17.44,-32.01,20.37,-3.78,13.06,-117.10,56.18,88.98,
8.15,-20.80,23.70,-19.66,29.11,8.29,7.74,-98.22,54.91,123.30,
15.52,-7.45,31.04,-8.22,38.22,21.78,5.76,-77.99,47.34,153.31,
23.53,5.38,39.07,2.98,47.29,38.71,6.58,-57.45,36.18,176.74,
32.16,18.76,47.71,14.88,55.08,61.71,9.76,-40.52,23.99,-163.75,
41.27,34.36,56.93,29.53,59.23,92.75,15.53,-26.40,12.16,-145.27,
49.92,54.65,66.04,51.59,57.34,126.97,22.59,-13.65,2.14,-126.20,
55.50,81.56,72.21,90.19,49.88,155.84,30.32,-1.48,-4.71,-105.49,
55.92,113.45,70.26,139.40,39.23,178.48,38.55,10.92,-7.09,-83.11,
50.58,143.40,61.40,172.50,27.38,-162.27,47.25,24.86,-4.77,-60.15,
41.30,167.74,50.34,-166.33,15.74,-143.93,56.21,43.14,-0.54,-38.22,
30.03,-171.78,39.24,-149.48,5.71,-124.87,63.77,70.19,4.75,-24.15,
18.40,-152.91,29.17,-133.78,-1.18,-104.31,66.51,108.81,11.86,-11.51,
7.69,-133.71,20.84,-117.74,-3.72,-82.28,61.95,146.15,20.05,0.65,
-0.52,-113.33,14.97,-100.79,-2.58,-59.75,52.78,172.46,28.91,13.29,
-4.91,-91.36,11.92,-82.84,0.34,-40.12,41.93,-167.91,38.21,27.90,
These are some of the problems with your current solution.
You created a csv.reader object but then you did not use it
You read each line but then you did not store them anywhere
You are not keeping track of 20 rows which was supposed to be your requirement
You created the output file in a separate with block which does not have access anymore to the read lines or the csv.reader object
Here's a working solution:
import csv
inp_file = "input.csv"
out_file_pattern = "file_{:{fill}2}.csv"
max_rows = 20
with open(inp_file, "r") as inp_f:
reader = csv.reader(inp_f)
all_rows = []
cur_file = 1
for row in reader:
all_rows.append(row)
if len(all_rows) == max_rows:
with open(out_file_pattern.format(cur_file, fill="0"), "w") as out_f:
writer = csv.writer(out_f)
writer.writerows(all_rows)
all_rows = []
cur_file += 1
The flow is as follows:
Read each row of the CSV using a csv.reader
Store each row in an all_rows list
Once that list gets 20 rows, open a file and write all the rows to it
Use the csv.writer's writerows method
Use a cur_file counter to format the filename
Every time 20 rows are dumped to a file, empty out the list and increment the file counter
This solution includes the blank lines as part of the 20 rows. Your test file has actually 19 rows of CSV data and 1 row for a blank line. If you need to skip the blank line, just add a simple check of
if not row:
continue
Also, as I mentioned in a comment, I assume that the input file is an actual CSV file, meaning it's a plain text file with CSV formatted data. If the input is actually an Excel file, then solutions like this won't work, because you'll need some special libraries to read Excel files, even if the contents visually looks like CSV or even if you rename the file to .csv.
Without using any special CSV libraries (e.g. csv, though you could, just that I don't know how to use them, however don't think it is necessary for this case), you could:
excel_csv_fp = open(r"<file_name>", "r", encoding="utf-8") # Check proper encoding for your file
csv_data = excel_csv_fp.readlines()
file_counter = 0
new_file_name = ""
new_fp = ""
for line in csv_data:
if line == "":
if new_fp != "":
new_fp.close()
file_counter += 1
new_file_name = "file_" + "{:02d}".format(file_counter) # 1 turns into 01 and 10 turns 10 i.e. remains the same
new_fp = open("<some_path>/" + new_file_name + ".csv", "w", encoding="utf-8") # Makes a new CSV file to start writing to
elif new_fp != "": # Updated code to make sure new_fp is a file pointer and not a string
new_fp.write(line) # Write each line after a space
If you have any questions on any of the code (how it works, why I choose what etc.), just ask in the comments and I'll try to reply as soon as possible.
I am trying to write a script that will take several 2 column files, write the first and second columns from the first one to a result file and then only the second columns from all other files and append them on.
Example:
File one File two
Column 1 Column 2 dont take this column Column 2
Line 1 Line 2 dont take this column Line 2
The final result should be
Result file
Column 1 Column 2 Column 2
Line1 Line 2 Line 2
etc
I have the almost everything working except for adding the second columns onto the first. I am taking the ResultFile as r+ and I want to read out the line that's there (the first file data) and then read the corresponding line from the other files, append it, and put it back in.
Here's the code I have for the second section:
#Open each subsequent file for 2nd column data
while n < i:
with open(FileNames[n], "r") as InputFile
with ResultFile:
Temp2 = ResultFile.readline()
for line in InputFile:
Temp2 += line.split(",", 1)[-1]
if line == LastValue:
break
if len(ResultFile,readline()) == "":
break
YData += (Temp2 + "\n")
n += 1
InputFile.close
The break IFs are not working quite right atm I just needed a way to end the infinite loop. Also LastValue is equal to the last x column value from the first file.
Any help would be appreciated
EDIT
I'm trying to do this without itertools.
It might help to open up all the files first and store them in a list.
fileHandles = []
for f in fileNames:
fileHandles.append(open(f))
Then you can just readline() them in order for each line in the first file.
dataLine = fileHandles[0].readline()
while dataLine:
outFields = dataLine.split(",")[0:2]
for inFile in fileHandles[1:]:
dataLine = inFile.readline()
field = dataLine.split(",")[1]
outFields.append(field)
print ",".join(outFields)
dataLine = fileHandles[0].readline()
Fundamentally you want to loop over all input files simultaneously the way zip does with iterators.
This example illustrates the pattern without the distraction of files and csvs:
file_row_col = [[['1A1', '1A2'], # File 1, Row A, Column 1 and 2
['1B1', '1B2']], # File 1, Row B, Column 1 and 2
[['2A1', '2A2'], # File 2
['2B1', '2B2']],
[['3A1', '3A2'], # File 3
['3B1', '3B2']]]
outrows = []
for rows in zip(*file_row_col):
outrow = [rows[0][0]] # Column 1 of the first file
for row in rows:
outrow.extend(row[1:]) # Only Column 2 and on
outrows.append(outrow)
# outrows is now [['1A1', '1A2', '2A2', '3A2'],
# ['1B1', '1B2', '2B2', '3B2']]
The key to this is the transformation done by zip(*file_row_col).
Now let's reimplement this pattern with actual files. I'm going to use the csv library make reading and writing the csvs easier and safer.
import csv
infilenames = ['1.csv','2.csv','3.csv']
outfilename = 'result.csv'
with open(outfilename, 'wb') as out:
outcsv = csv.writer(out)
infiles = []
# We can't use `with` with a list of resources, so we use
# try...finally the old-fashioned way instead.
try:
incsvs = []
for infilename in infilenames:
infile = open(infilename, 'rb')
infiles.append(infile)
incsvs.append(csv.reader(infile))
for inrows in zip(*incsvs):
outrow = [inrows[0][0]] # Column 1 of file 1
for inrow in inrows:
outrow.extend(inrow[1:])
outcsv.writerow(outrow)
finally:
for infile in infiles:
infile.close()
Given these input files:
#1.csv
1A1,1A2
1B1,1B2
#2.csv
2A1,2A2
2B1,2B2
#3.csv
3A1,3A2
3B1,3B2
the code produces this result.csv:
1A1,1A2,2A2,3A2
1B1,1B2,2B2,3B2