How to add another loop to a Python nested loop? - python

Edited
I am new to Python, having a problem adding a loop to a nested loop Python code.
using Python 3.8 on my windows 7 machine.
The code does when run once: it reads from multiple CSV files, row by row, and CSV file by CSV file, and uses the data from each row ( within a given range)to run the function until there is no CSV file left, each CSV file has 4 columns, all CSV files have one header each.
There are a few seconds of delay between each row reading.
since the code is just for one-time use, when you run the code again, it reads the same rows, it does not loop to read other rows.
So I want to add another loop to it, so each time you run the file somehow it remembers the last row that was used and starts from the next row.
So assume it has been set to a range of 2 rows:
the first-time run: uses row 1 and 2 to run the function
second-time run: uses row 3 and 4 to run the function, and so on
Appreciate your help to make it work.
Example CSV
img_url,desc_1 title_1,link_1
site.com/image22.jpg;someTitle;description1;site1.com
site.com/image32.jpg;someTitle;description2;site2.com
site.com/image44.jpg;someTitle;description3;site3.com
Here is the working code I have:
from abc.zzz import xyz
path_id_map = [
{'path':'file1.csv', 'id': '12345678'},
{'path':'file2.csv', 'id': '44556677'}
{'path':'file3.csv', 'id': '33377799'}
{'path':'file4.csv', 'id': '66221144'}]
s_id = None
for pair in path_id_map:
with open(pair['path'], 'r') as f:
next(f) # skip first header line
for _ in range(1, 3):
line = next(f)
img_url, title_1, desc_1, link_1 = map(str.strip, line.split(';'))
zzz.func1(img_url=img_url, title_1=title_1, desc_1=desc_1,
link_1=link_1, B_id=B_id=pair['id'], s_id=s_id)
time.sleep(25)
**** Update ****
After a few days of looking for a solution, a Code has been posted( UPDATE 2):
but there is a major problem with it.
it works the way I want only when using the print function,
I adopted my function to it but, when it runs for a second time or more, it does not loop to the next rows, (it only does loop correctly on the last CSV file though),
the author of the code could not correct his code, I can not figure out what is wrong with it.
I checked the CSV files and tested them with the print function, they are OK.
perhaps someone helps to correct the problem or another solution altogether.

Hi I hope I have understood what you're asking. I think the below code might guide you if you adjust it a little bit for your case. You can store the number of the final line into a text file. I also assume that as a delimiter the semi-colon is used.
UPDATE 1:
Okay, I think I came up with this solution to your problem, hopefully. The only prerequisite to run this is to have a text file which includes the number of row you want to begin with for the first run (e.g. 1).
# define function
import csv
import time
import subprocess
import os
import itertools
# txt file that contains the number of line to start the next time
dir_txt = './'
fname_txt = 'number_of_last_line.txt'
path = os.path.join(dir_txt, fname_txt)
# assign line number to variable after reading relevant txt
with open(path, 'r', newline='') as f:
n = int(f.read())
# define path of csv file
fpath = './file1.csv'
# open csv file
with open(fpath, 'r', newline='') as csvfile:
csv_reader = csv.reader(csvfile, delimiter=';')
# Iterate every row of csv. csv_reader row number starts from 1,
# csv_reader generator starts from 0
for row in itertools.islice(csv_reader, n, n+3):
print('row {0} contains {1}'.format(csv_reader.line_num, row))
time.sleep(3)
# Store the number of line to start the next time
n = csv_reader.line_num + 1
# Bash (or cmd) command execution, option. You can do this with python also
sh_command = 'echo {0} > {1}'.format(csv_reader.line_num, path)
subprocess.run(sh_command, shell=True)
UPDATE 2:
Here's a revision with the code working for multiple files using the input of #Error - Syntactical Remorse. The first thing you need to do is open the metadata.json file and insert the number of row you want to begin each file, for the first run only. You also need to change the file directories according to your situation.
# define function
def get_json_metadata(json_fpath):
"""Read json file
Args:
json_fpath -- string (filepath)
Returns:
json_list -- list"""
with open(json_fpath, mode='r') as json_file:
json_str = json_file.read()
json_list = json.loads(json_str)
return json_list
# Imports
import csv, json
import time
import os
import itertools
# json file that contains the number of line to start the next time
dir_json = './'
fname_json = 'metadata.json'
json_fpath = os.path.join(dir_json, fname_json)
# csv filenames, IDs and number of row to start reading are extracted
path_id_map = get_json_metadata(json_fpath)
# iterate over csvfiles
for nfile in path_id_map:
print('\n------ Reading {} ------\n'.format(nfile['path']))
with open(nfile['path'], 'r', newline='') as csvfile:
csv_reader = csv.reader(csvfile, delimiter=';')
# Iterate every row of csv. csv_reader row number starts from 1,
# csv_reader generator starts from 0
for row in itertools.islice(csv_reader, nfile['nrow'], nfile['nrow']+5):
# skip empty line (list)
if not row:
continue
# assign values to variables
img_url, title_1, desc_1, link_1 = row
B_id = nfile['id']
print('row {0} contains {1}'.format(csv_reader.line_num, row))
time.sleep(3)
# Store the number of line to start the next time
nfile['nrow'] = csv_reader.line_num
with open(json_fpath, mode='w') as json_file:
json_str = json.dumps(path_id_map, indent=4)
json_file.write(json_str)
This is how the metadata.json format should be:
[
{
"path": "file1.csv",
"id": "12345678",
"nrow": 1
},
{
"path": "file2.csv",
"id": "44556677",
"nrow": 1
},
{
"path": "file3.csv",
"id": "33377799",
"nrow": 1
},
{
"path": "file4.csv",
"id": "66221144",
"nrow": 1
}
]

Related

How to read a csv file and create a new csv file after every nth number of rows?

I'm trying to write a function that reads a sheet of an existing .csv file and every 20 rows are copied to a newly created csv file. Therefore, it needs to be designed like a file counter "file_01, file_02, file_04,...," where the first 20 rows are copied to file_01, the next 20 to file_02.csv, and so on.
Currently I have this code which hasn't worked for me work so far.
import csv
import os.path
from itertools import islice
N = 20
new_filename = ""
filename = ""
with open(filename, "rb") as file: # the a opens it in append mode
reader = csv.reader(file)
for i in range(N):
line = next(file).strip()
#print(line)
with open(new_filename, 'wb') as outfh:
writer = csv.writer(outfh)
writer.writerow(line)
writer.writerows(islice(reader, 2))
I have attached a file for testing.
https://1drv.ms/u/s!AhdJmaLEPcR8htYqFooEoYUwDzdZbg
32.01,18.42,58.98,33.02,55.37,63.25,12.82,-32.42,33.99,179.53,
41.11,33.94,67.85,57.61,59.23,94.69,19.43,-19.15,21.71,-161.13,
49.80,54.12,72.78,100.74,56.97,128.84,26.95,-6.76,10.07,-142.62,
55.49,81.02,68.93,148.17,49.25,157.32,34.94,5.39,0.44,-123.32,
56.01,112.81,59.27,177.87,38.50,179.63,43.43,18.42,-5.81,-102.24,
50.79,142.87,48.06,-162.32,26.60,-161.21,52.38,34.37,-7.42,-79.64,
41.54,167.36,37.12,-145.93,15.01,-142.84,60.90,57.05,-4.47,-56.54,
30.28,-172.09,27.36,-130.24,5.11,-123.66,66.24,91.12,-0.76,-35.44,
18.64,-153.20,19.52,-114.09,-1.54,-102.96,64.77,131.32,5.12,-21.68,
7.92,-134.07,14.24,-96.93,-3.79,-80.91,57.10,162.35,12.51,-9.21,
-0.34,-113.74,11.80,-78.73,-2.49,-58.46,46.75,-175.86,20.81,2.87,
-4.81,-91.85,11.78,-60.28,0.59,-39.26,35.75,-158.12,29.79,15.71,
-4.76,-68.67,13.79,-43.84,6.82,-24.69,25.27,-141.56,39.05,30.71,
-1.33,-46.42,18.44,-30.23,14.53,-11.95,16.21,-124.45,47.91,50.25,
4.14,-29.61,24.89,-18.02,23.01,0.10,9.59,-106.05,54.46,77.07,
11.04,-15.39,32.33,-6.66,31.92,12.48,6.24,-86.34,55.72,110.53,
18.69,-2.32,40.46,4.57,41.11,26.87,6.07,-65.68,50.25,142.78,
26.94,10.56,49.18,16.67,49.92,45.39,8.06,-46.86,40.13,168.29,
35.80,24.58,58.45,31.99,56.83,70.92,12.96,-31.90,28.10,-171.07,
44.90,41.72,67.41,55.89,59.21,103.94,19.63,-18.67,15.97,-152.40,
-5.41,-77.62,11.40,-63.21,4.80,-29.06,31.33,-151.44,43.00,37.25,
-2.88,-54.38,13.08,-46.00,12.16,-15.86,21.21,-134.62,51.25,59.16,
1.69,-35.73,17.44,-32.01,20.37,-3.78,13.06,-117.10,56.18,88.98,
8.15,-20.80,23.70,-19.66,29.11,8.29,7.74,-98.22,54.91,123.30,
15.52,-7.45,31.04,-8.22,38.22,21.78,5.76,-77.99,47.34,153.31,
23.53,5.38,39.07,2.98,47.29,38.71,6.58,-57.45,36.18,176.74,
32.16,18.76,47.71,14.88,55.08,61.71,9.76,-40.52,23.99,-163.75,
41.27,34.36,56.93,29.53,59.23,92.75,15.53,-26.40,12.16,-145.27,
49.92,54.65,66.04,51.59,57.34,126.97,22.59,-13.65,2.14,-126.20,
55.50,81.56,72.21,90.19,49.88,155.84,30.32,-1.48,-4.71,-105.49,
55.92,113.45,70.26,139.40,39.23,178.48,38.55,10.92,-7.09,-83.11,
50.58,143.40,61.40,172.50,27.38,-162.27,47.25,24.86,-4.77,-60.15,
41.30,167.74,50.34,-166.33,15.74,-143.93,56.21,43.14,-0.54,-38.22,
30.03,-171.78,39.24,-149.48,5.71,-124.87,63.77,70.19,4.75,-24.15,
18.40,-152.91,29.17,-133.78,-1.18,-104.31,66.51,108.81,11.86,-11.51,
7.69,-133.71,20.84,-117.74,-3.72,-82.28,61.95,146.15,20.05,0.65,
-0.52,-113.33,14.97,-100.79,-2.58,-59.75,52.78,172.46,28.91,13.29,
-4.91,-91.36,11.92,-82.84,0.34,-40.12,41.93,-167.91,38.21,27.90,
These are some of the problems with your current solution.
You created a csv.reader object but then you did not use it
You read each line but then you did not store them anywhere
You are not keeping track of 20 rows which was supposed to be your requirement
You created the output file in a separate with block which does not have access anymore to the read lines or the csv.reader object
Here's a working solution:
import csv
inp_file = "input.csv"
out_file_pattern = "file_{:{fill}2}.csv"
max_rows = 20
with open(inp_file, "r") as inp_f:
reader = csv.reader(inp_f)
all_rows = []
cur_file = 1
for row in reader:
all_rows.append(row)
if len(all_rows) == max_rows:
with open(out_file_pattern.format(cur_file, fill="0"), "w") as out_f:
writer = csv.writer(out_f)
writer.writerows(all_rows)
all_rows = []
cur_file += 1
The flow is as follows:
Read each row of the CSV using a csv.reader
Store each row in an all_rows list
Once that list gets 20 rows, open a file and write all the rows to it
Use the csv.writer's writerows method
Use a cur_file counter to format the filename
Every time 20 rows are dumped to a file, empty out the list and increment the file counter
This solution includes the blank lines as part of the 20 rows. Your test file has actually 19 rows of CSV data and 1 row for a blank line. If you need to skip the blank line, just add a simple check of
if not row:
continue
Also, as I mentioned in a comment, I assume that the input file is an actual CSV file, meaning it's a plain text file with CSV formatted data. If the input is actually an Excel file, then solutions like this won't work, because you'll need some special libraries to read Excel files, even if the contents visually looks like CSV or even if you rename the file to .csv.
Without using any special CSV libraries (e.g. csv, though you could, just that I don't know how to use them, however don't think it is necessary for this case), you could:
excel_csv_fp = open(r"<file_name>", "r", encoding="utf-8") # Check proper encoding for your file
csv_data = excel_csv_fp.readlines()
file_counter = 0
new_file_name = ""
new_fp = ""
for line in csv_data:
if line == "":
if new_fp != "":
new_fp.close()
file_counter += 1
new_file_name = "file_" + "{:02d}".format(file_counter) # 1 turns into 01 and 10 turns 10 i.e. remains the same
new_fp = open("<some_path>/" + new_file_name + ".csv", "w", encoding="utf-8") # Makes a new CSV file to start writing to
elif new_fp != "": # Updated code to make sure new_fp is a file pointer and not a string
new_fp.write(line) # Write each line after a space
If you have any questions on any of the code (how it works, why I choose what etc.), just ask in the comments and I'll try to reply as soon as possible.

How do I add from a newly created csv file Column?

I would like to create a file in real time and add the values corresponding to the columns to an existing file in real time in the corresponding CSV file.
How can I add each of the CSV files that I generate in that program?
I'll write down the code I'm using now.
import csv
for i in range(10):
SD="Save datas(Angle)"+str(i) ## 해당 각도별로 배열을 지정
SDArray1=str(SD) ## 파일을 만들어준다
f=open(SDArray1+".csv","a+t")# ## 이름을 만들어준 파일을 생성
csv_writer = csv.writer(f)
csv_writer.writerow([SD])
print("One loop has started")
f.close()#
for i in range(1,5):
cdata=[i]
f=open(SDArray1+".csv","a+t")
csv_writer =csv.writer(f)
csv_writer.writerow(cdata)
print(cdata)
f.close()#
print("loop's finished!")
If you look at the code above, a certain file is created. I completed the next file, but I was wondering how to add columns to the file.
csv.write_row() takes a complete row of columns - if you need more, add them to your cdata=[i]- f.e. cdata=[i,i*2,i*3,i*4].
You should use with open() as f: for file manipulation, it is more resilient against errors and autocloses the file when leaving the with-block.
Fixed:
import csv
# do not use i here and down below, thats confusing, better names are a plus
for fileCount in range(10):
filename = "filename{}.csv".format(fileCount) # creates filename0.csv ... filename9.csv
with open(filename,"w") as f:# # create file new
csv_writer = csv.writer(f)
# write headers
csv_writer.writerow(["data1","data2","data3"])
# write 4 rows of data
for i in range(1,5):
cdata=[(fileCount*100000+i*1000+k) for k in range(3)] # create 3 datapoints
# write one row of data [1000,1001,1002] up to [9004000,9004001,9004002]
# for last i and fileCount
csv_writer.writerow(cdata)
# no file.close- leaving wiht open() scope autocloses
Check what we have written:
import os
for d in sorted(os.listdir("./")):
if d.endswith("csv"):
print(d,":")
print("*"*(len(d)+2))
with open(d,"r") as f:
print(f.read())
print("")
Output:
filename0.csv :
***************
data1,data2,data3
1000,1001,1002
2000,2001,2002
3000,3001,3002
4000,4001,4002
filename1.csv :
***************
data1,data2,data3
101000,101001,101002
102000,102001,102002
103000,103001,103002
104000,104001,104002
filename2.csv :
***************
data1,data2,data3
201000,201001,201002
[...snip the rest - you get the idea ...]
filename9.csv :
***************
data1,data2,data3
901000,901001,901002
902000,902001,902002
903000,903001,903002
904000,904001,904002
To add a new column to an existing file:
open old file to read
open new file to write
read the old files header, add new column header and write it in new file
read all rows, add new columns value to each row and write it in new file
Example:
Adding the sum of column values to the file and writing as new file:
filename = "filename0.csv"
newfile = "filename0new.csv"
# open one file to read, open other (new one) to write
with open(filename,"r") as r, open(newfile,"w") as w:
reader = csv.reader(r)
writer = csv.writer(w)
newHeader = next(reader) # read the header
newHeader.append("Sum") # append new column-header
writer.writerow(newHeader) # write header
# for each row:
for row in reader:
row.append(sum(map(int,row))) # read it, sum the converted int values
writer.writerow(row) # write it
# output the newly created file:
with open(newfile,"r") as n:
print(n.read())
Output:
data1,data2,data3,Sum
1000,1001,1002,3003
2000,2001,2002,6003
3000,3001,3002,9003
4000,4001,4002,12003

Compare rows of csv and work out percentage

I'm relatively new to Python. I'm trying to find a way to create a script that looks at a CSV file called "data_old" from a previous month, and compares it with the data in a more recent month called "data_new", then finally outputs that data into a new CSV "data_compare".
The files each month are consistently laid out and look like this (example)
Month 1
Company, StaffNumber, NeedToPass, Passed, %age meeting requirement
xxxxxxxx, 100, 80, 30, 30%
Month 3
Company, StaffNumber, NeedToPass, Passed, %meeting requirement
xxxxxxxx, 101, 81, 54, 60%
I'm trying to get the output file to compare the data from all rows and show me "Percentage improved, instead of "Percentage meeting requirement". Nothing I try seems to work.
As the numbers change all the time the only common data will be the company name.
I need a simple, explanatory way with comments... as I'd like to understand the logic so I can modify it and add functions.
Much appreciated.
Here ist a python code example which might does what you want. This script asumes that the two input csv files have the same amount of lines. In the function test the function zip i used, which stops if one list is at the end. If your files have a different amount of lines you have to manually loop over both. But I think it is a good starting point
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import csv
def parse_csv(filename, sort_row=0, as_dict=False, delimiter=","):
r = list()
with open(filename, "rb") as f:
# make csv reader object
reader = csv.reader(f, delimiter=delimiter)
if as_dict:
# make dict if desired
header = [h.strip() for h in reader.next()]
for row in reader:
if as_dict:
# make dict if desired
r.append(dict(zip(header, row)))
else:
# strip each item in the row and append it to the return list
r.append([h.strip() for h in row])
# sort the list by the first item (company name in this example)
r.sort(key=lambda x: x[sort_row])
return r
def write_csv(filename, fieldnames, rows, delimiter=","):
with open(filename, "w") as f:
# make csv writer object
writer = csv.writer(f, delimiter=delimiter)
# write the first header line
writer.writerow(fieldnames)
for row in rows:
# write each row
writer.writerow(row)
def test():
data_old = parse_csv("m1.csv")
data_new = parse_csv("m2.csv")
#write_csv("data_compare.csv", data_old[:1][0], data_old[1:])
result = list()
# loop over the items (skipping the first header row)
for o, n in zip(data_old[1:], data_new[1:]):
# calculate the improvement (or whatever needs to be calculated)
value = float(n[4].replace("%", "")) - float(o[4].replace("%", ""))
# create the row
result.append([o[0], "%s%%" % value, o[4], n[4]])
#result.append(["%s%%" % value])
header = ["Company", "Percentage improved", "old", "new"]
#header = ["Company", "Percentage improved"]
write_csv("data_compare.csv", header, result)
if __name__ == '__main__':
test()

Loop retrieve data from csv, append to file

I have created a Python 2.7 script that does the following:
Gets a list of filenames from a folder, and writes them to a csv file, one for each row.
And
Enters data into a search box on the web.
Writes the result from the search box into another csv file.
So what I would like now, is for the csv data in (1 ) to act as the input for (2 ).
i.e. for each filename in the csv file, it conducts a search for that cell.
Additionally, instead of just writing the results into a second csv file in (3 ), I would like to append the result into the first csv file – OR generate a new one with both columns.
I can provide the code, but since it's 50 lines already, I've just tried to keep this question descriptive.
Update: Proposed retrieval and append:
with open("file.csv","a+") as f:
r = csv.reader(f)
wr = csv.writer(f, delimiter="\n")
result = []
for line in r:
searchbox = driver.find_element_by_name("searchbox")
searchbox.send_keys(line)
sleep(8)
search_reply = driver.find_element_by_class_name("search_reply")
result = re.findall("((?<=\()[0-9]*)", search_reply.text)
wr.writerow(result)
Open for reading and appending, store the output then write at the end:
import csv
with open("first.csv","a+") as f:
r = csv.reader(f)
wr = csv.writer(f,delimiter="\n")
result = []
for line in r:
# process lines/step 2
# append to result
wr.writerow(result)

Building list of lists from CSV file

I have an Excel file(that I am exporting as a csv) that I want to parse, but I am having trouble with finding the best way to do it. The csv is a list of computers in my network, and what accounts are in the local administrator group for each one. I have done something similar with tuples, but the number of accounts for each computer range from 1 to 30. I want to build a list of lists, then go through each list to find the accounts that should be there(Administrator, etc.) and delete them, so that I can then export a list of only accounts that shouldn't be a local admin, but are. The csv file is formatted as follows:
"computer1" Administrator localadmin useraccount
"computer2" localadmin Administrator
"computer3" localadmin Administrator user2account
Any help would be appreciated
EDIT: Here is the code I am working with
import csv
import sys #used for passing in the argument
file_name = sys.argv[1] #filename is argument 1
with open(file_name, 'rU') as f: #opens PW file
reader = csv.reader(f)
data = list(list(rec) for rec in csv.reader(f, delimiter=',')) #reads csv into a list of lists
f.close() #close the csv
for i in range(len(data)):
print data[i][0] #this alone will print all the computer names
for j in range(len(data[i])) #Trying to run another for loop to print the usernames
print data[i][j]
The issue is with the second for loop. I want to be able to read across each line and for now, just print them.
This should get you on the right track:
import csv
import sys #used for passing in the argument
file_name = sys.argv[1] #filename is argument 1
with open(file_name, 'rU') as f: #opens PW file
reader = csv.reader(f)
data = list(list(rec) for rec in csv.reader(f, delimiter=',')) #reads csv into a list of lists
for row in data:
print row[0] #this alone will print all the computer names
for username in row: #Trying to run another for loop to print the usernames
print username
Last two lines will print all of the row (including the "computer"). Do
for x in range(1, len(row)):
print row[x]
... to avoid printing the computer twice.
Note that f.close() is not required when using the "with" construct because the resource will automatically be closed when the "with" block is exited.
Personally, I would just do:
import csv
import sys #used for passing in the argument
file_name = sys.argv[1] #filename is argument 1
with open(file_name, 'rU') as f: #opens PW file
reader = csv.reader(f)
# Print every value of every row.
for row in reader:
for value in row:
print value
That's a reasonable way to iterate through the data and should give you a firm basis to add whatever further logic is required.
This is how I opened a .csv file and imported columns of data as numpy arrays - naturally, you don't need numpy arrays, but...
data = {}
app = QApplication( sys.argv )
fname = unicode ( QFileDialog.getOpenFileName() )
app.quit()
filename = fname.strip('.csv') + ' for release.csv'
#open the file and skip the first two rows of data
imported_array = np.loadtxt(fname, delimiter=',', skiprows = 2)
data = {'time_s':imported_array[:,0]}
data['Speed_RPM'] = imported_array[:,1]
It can be done using the pandas library.
import pandas as pd
df = pd.read_csv(filename)
list_of_lists = df.values.tolist()
This approach applies to other kinds of data like .tsv, etc.

Categories