Unable to read CSV file in a nested loop - python

import csv
s = open('models.csv')
checkIt = csv.reader(s)
o = open('data.csv')
csv_o = csv.reader(o)
for c in checkIt:
abc = c[0].split(".")
abcd = abc[2]
commodity_type = abcd[6:]
print(commodity_type)
**for csv in csv_o:
print(csv)
print(commodity_type)**
print function is executing only one time, it should execute for 4 time because i have 4 rows in models.csv file.
please give some solution that nested for loop run for according to number of row in models.csv

Try resetting the file pointer that csv_o points to.
for csv in csv_o:
print(csv)
print(commodity_type)
o.seek(0)
That should automatically make the CSV reader begin reading from the start of the file from the next iteration onwards.

Related

How to add another loop to a Python nested loop?

Edited
I am new to Python, having a problem adding a loop to a nested loop Python code.
using Python 3.8 on my windows 7 machine.
The code does when run once: it reads from multiple CSV files, row by row, and CSV file by CSV file, and uses the data from each row ( within a given range)to run the function until there is no CSV file left, each CSV file has 4 columns, all CSV files have one header each.
There are a few seconds of delay between each row reading.
since the code is just for one-time use, when you run the code again, it reads the same rows, it does not loop to read other rows.
So I want to add another loop to it, so each time you run the file somehow it remembers the last row that was used and starts from the next row.
So assume it has been set to a range of 2 rows:
the first-time run: uses row 1 and 2 to run the function
second-time run: uses row 3 and 4 to run the function, and so on
Appreciate your help to make it work.
Example CSV
img_url,desc_1 title_1,link_1
site.com/image22.jpg;someTitle;description1;site1.com
site.com/image32.jpg;someTitle;description2;site2.com
site.com/image44.jpg;someTitle;description3;site3.com
Here is the working code I have:
from abc.zzz import xyz
path_id_map = [
{'path':'file1.csv', 'id': '12345678'},
{'path':'file2.csv', 'id': '44556677'}
{'path':'file3.csv', 'id': '33377799'}
{'path':'file4.csv', 'id': '66221144'}]
s_id = None
for pair in path_id_map:
with open(pair['path'], 'r') as f:
next(f) # skip first header line
for _ in range(1, 3):
line = next(f)
img_url, title_1, desc_1, link_1 = map(str.strip, line.split(';'))
zzz.func1(img_url=img_url, title_1=title_1, desc_1=desc_1,
link_1=link_1, B_id=B_id=pair['id'], s_id=s_id)
time.sleep(25)
**** Update ****
After a few days of looking for a solution, a Code has been posted( UPDATE 2):
but there is a major problem with it.
it works the way I want only when using the print function,
I adopted my function to it but, when it runs for a second time or more, it does not loop to the next rows, (it only does loop correctly on the last CSV file though),
the author of the code could not correct his code, I can not figure out what is wrong with it.
I checked the CSV files and tested them with the print function, they are OK.
perhaps someone helps to correct the problem or another solution altogether.
Hi I hope I have understood what you're asking. I think the below code might guide you if you adjust it a little bit for your case. You can store the number of the final line into a text file. I also assume that as a delimiter the semi-colon is used.
UPDATE 1:
Okay, I think I came up with this solution to your problem, hopefully. The only prerequisite to run this is to have a text file which includes the number of row you want to begin with for the first run (e.g. 1).
# define function
import csv
import time
import subprocess
import os
import itertools
# txt file that contains the number of line to start the next time
dir_txt = './'
fname_txt = 'number_of_last_line.txt'
path = os.path.join(dir_txt, fname_txt)
# assign line number to variable after reading relevant txt
with open(path, 'r', newline='') as f:
n = int(f.read())
# define path of csv file
fpath = './file1.csv'
# open csv file
with open(fpath, 'r', newline='') as csvfile:
csv_reader = csv.reader(csvfile, delimiter=';')
# Iterate every row of csv. csv_reader row number starts from 1,
# csv_reader generator starts from 0
for row in itertools.islice(csv_reader, n, n+3):
print('row {0} contains {1}'.format(csv_reader.line_num, row))
time.sleep(3)
# Store the number of line to start the next time
n = csv_reader.line_num + 1
# Bash (or cmd) command execution, option. You can do this with python also
sh_command = 'echo {0} > {1}'.format(csv_reader.line_num, path)
subprocess.run(sh_command, shell=True)
UPDATE 2:
Here's a revision with the code working for multiple files using the input of #Error - Syntactical Remorse. The first thing you need to do is open the metadata.json file and insert the number of row you want to begin each file, for the first run only. You also need to change the file directories according to your situation.
# define function
def get_json_metadata(json_fpath):
"""Read json file
Args:
json_fpath -- string (filepath)
Returns:
json_list -- list"""
with open(json_fpath, mode='r') as json_file:
json_str = json_file.read()
json_list = json.loads(json_str)
return json_list
# Imports
import csv, json
import time
import os
import itertools
# json file that contains the number of line to start the next time
dir_json = './'
fname_json = 'metadata.json'
json_fpath = os.path.join(dir_json, fname_json)
# csv filenames, IDs and number of row to start reading are extracted
path_id_map = get_json_metadata(json_fpath)
# iterate over csvfiles
for nfile in path_id_map:
print('\n------ Reading {} ------\n'.format(nfile['path']))
with open(nfile['path'], 'r', newline='') as csvfile:
csv_reader = csv.reader(csvfile, delimiter=';')
# Iterate every row of csv. csv_reader row number starts from 1,
# csv_reader generator starts from 0
for row in itertools.islice(csv_reader, nfile['nrow'], nfile['nrow']+5):
# skip empty line (list)
if not row:
continue
# assign values to variables
img_url, title_1, desc_1, link_1 = row
B_id = nfile['id']
print('row {0} contains {1}'.format(csv_reader.line_num, row))
time.sleep(3)
# Store the number of line to start the next time
nfile['nrow'] = csv_reader.line_num
with open(json_fpath, mode='w') as json_file:
json_str = json.dumps(path_id_map, indent=4)
json_file.write(json_str)
This is how the metadata.json format should be:
[
{
"path": "file1.csv",
"id": "12345678",
"nrow": 1
},
{
"path": "file2.csv",
"id": "44556677",
"nrow": 1
},
{
"path": "file3.csv",
"id": "33377799",
"nrow": 1
},
{
"path": "file4.csv",
"id": "66221144",
"nrow": 1
}
]

How to read a csv file and create a new csv file after every nth number of rows?

I'm trying to write a function that reads a sheet of an existing .csv file and every 20 rows are copied to a newly created csv file. Therefore, it needs to be designed like a file counter "file_01, file_02, file_04,...," where the first 20 rows are copied to file_01, the next 20 to file_02.csv, and so on.
Currently I have this code which hasn't worked for me work so far.
import csv
import os.path
from itertools import islice
N = 20
new_filename = ""
filename = ""
with open(filename, "rb") as file: # the a opens it in append mode
reader = csv.reader(file)
for i in range(N):
line = next(file).strip()
#print(line)
with open(new_filename, 'wb') as outfh:
writer = csv.writer(outfh)
writer.writerow(line)
writer.writerows(islice(reader, 2))
I have attached a file for testing.
https://1drv.ms/u/s!AhdJmaLEPcR8htYqFooEoYUwDzdZbg
32.01,18.42,58.98,33.02,55.37,63.25,12.82,-32.42,33.99,179.53,
41.11,33.94,67.85,57.61,59.23,94.69,19.43,-19.15,21.71,-161.13,
49.80,54.12,72.78,100.74,56.97,128.84,26.95,-6.76,10.07,-142.62,
55.49,81.02,68.93,148.17,49.25,157.32,34.94,5.39,0.44,-123.32,
56.01,112.81,59.27,177.87,38.50,179.63,43.43,18.42,-5.81,-102.24,
50.79,142.87,48.06,-162.32,26.60,-161.21,52.38,34.37,-7.42,-79.64,
41.54,167.36,37.12,-145.93,15.01,-142.84,60.90,57.05,-4.47,-56.54,
30.28,-172.09,27.36,-130.24,5.11,-123.66,66.24,91.12,-0.76,-35.44,
18.64,-153.20,19.52,-114.09,-1.54,-102.96,64.77,131.32,5.12,-21.68,
7.92,-134.07,14.24,-96.93,-3.79,-80.91,57.10,162.35,12.51,-9.21,
-0.34,-113.74,11.80,-78.73,-2.49,-58.46,46.75,-175.86,20.81,2.87,
-4.81,-91.85,11.78,-60.28,0.59,-39.26,35.75,-158.12,29.79,15.71,
-4.76,-68.67,13.79,-43.84,6.82,-24.69,25.27,-141.56,39.05,30.71,
-1.33,-46.42,18.44,-30.23,14.53,-11.95,16.21,-124.45,47.91,50.25,
4.14,-29.61,24.89,-18.02,23.01,0.10,9.59,-106.05,54.46,77.07,
11.04,-15.39,32.33,-6.66,31.92,12.48,6.24,-86.34,55.72,110.53,
18.69,-2.32,40.46,4.57,41.11,26.87,6.07,-65.68,50.25,142.78,
26.94,10.56,49.18,16.67,49.92,45.39,8.06,-46.86,40.13,168.29,
35.80,24.58,58.45,31.99,56.83,70.92,12.96,-31.90,28.10,-171.07,
44.90,41.72,67.41,55.89,59.21,103.94,19.63,-18.67,15.97,-152.40,
-5.41,-77.62,11.40,-63.21,4.80,-29.06,31.33,-151.44,43.00,37.25,
-2.88,-54.38,13.08,-46.00,12.16,-15.86,21.21,-134.62,51.25,59.16,
1.69,-35.73,17.44,-32.01,20.37,-3.78,13.06,-117.10,56.18,88.98,
8.15,-20.80,23.70,-19.66,29.11,8.29,7.74,-98.22,54.91,123.30,
15.52,-7.45,31.04,-8.22,38.22,21.78,5.76,-77.99,47.34,153.31,
23.53,5.38,39.07,2.98,47.29,38.71,6.58,-57.45,36.18,176.74,
32.16,18.76,47.71,14.88,55.08,61.71,9.76,-40.52,23.99,-163.75,
41.27,34.36,56.93,29.53,59.23,92.75,15.53,-26.40,12.16,-145.27,
49.92,54.65,66.04,51.59,57.34,126.97,22.59,-13.65,2.14,-126.20,
55.50,81.56,72.21,90.19,49.88,155.84,30.32,-1.48,-4.71,-105.49,
55.92,113.45,70.26,139.40,39.23,178.48,38.55,10.92,-7.09,-83.11,
50.58,143.40,61.40,172.50,27.38,-162.27,47.25,24.86,-4.77,-60.15,
41.30,167.74,50.34,-166.33,15.74,-143.93,56.21,43.14,-0.54,-38.22,
30.03,-171.78,39.24,-149.48,5.71,-124.87,63.77,70.19,4.75,-24.15,
18.40,-152.91,29.17,-133.78,-1.18,-104.31,66.51,108.81,11.86,-11.51,
7.69,-133.71,20.84,-117.74,-3.72,-82.28,61.95,146.15,20.05,0.65,
-0.52,-113.33,14.97,-100.79,-2.58,-59.75,52.78,172.46,28.91,13.29,
-4.91,-91.36,11.92,-82.84,0.34,-40.12,41.93,-167.91,38.21,27.90,
These are some of the problems with your current solution.
You created a csv.reader object but then you did not use it
You read each line but then you did not store them anywhere
You are not keeping track of 20 rows which was supposed to be your requirement
You created the output file in a separate with block which does not have access anymore to the read lines or the csv.reader object
Here's a working solution:
import csv
inp_file = "input.csv"
out_file_pattern = "file_{:{fill}2}.csv"
max_rows = 20
with open(inp_file, "r") as inp_f:
reader = csv.reader(inp_f)
all_rows = []
cur_file = 1
for row in reader:
all_rows.append(row)
if len(all_rows) == max_rows:
with open(out_file_pattern.format(cur_file, fill="0"), "w") as out_f:
writer = csv.writer(out_f)
writer.writerows(all_rows)
all_rows = []
cur_file += 1
The flow is as follows:
Read each row of the CSV using a csv.reader
Store each row in an all_rows list
Once that list gets 20 rows, open a file and write all the rows to it
Use the csv.writer's writerows method
Use a cur_file counter to format the filename
Every time 20 rows are dumped to a file, empty out the list and increment the file counter
This solution includes the blank lines as part of the 20 rows. Your test file has actually 19 rows of CSV data and 1 row for a blank line. If you need to skip the blank line, just add a simple check of
if not row:
continue
Also, as I mentioned in a comment, I assume that the input file is an actual CSV file, meaning it's a plain text file with CSV formatted data. If the input is actually an Excel file, then solutions like this won't work, because you'll need some special libraries to read Excel files, even if the contents visually looks like CSV or even if you rename the file to .csv.
Without using any special CSV libraries (e.g. csv, though you could, just that I don't know how to use them, however don't think it is necessary for this case), you could:
excel_csv_fp = open(r"<file_name>", "r", encoding="utf-8") # Check proper encoding for your file
csv_data = excel_csv_fp.readlines()
file_counter = 0
new_file_name = ""
new_fp = ""
for line in csv_data:
if line == "":
if new_fp != "":
new_fp.close()
file_counter += 1
new_file_name = "file_" + "{:02d}".format(file_counter) # 1 turns into 01 and 10 turns 10 i.e. remains the same
new_fp = open("<some_path>/" + new_file_name + ".csv", "w", encoding="utf-8") # Makes a new CSV file to start writing to
elif new_fp != "": # Updated code to make sure new_fp is a file pointer and not a string
new_fp.write(line) # Write each line after a space
If you have any questions on any of the code (how it works, why I choose what etc.), just ask in the comments and I'll try to reply as soon as possible.

Update a CSV row if it exists within a random sample

Got a CSV which I am selecting a random sample of 500 rows using the following code:
import csv
import random
with open('Original.csv' , "rb") as source:
lines = [line for line in source]
random_choice = random.sample(lines, 500);
what I'd like to do is update a column called [winner] if they exist within the sample and then save it back to a csv file but I have no idea how to achieve this...
There is a unique identifier in a column called [ID].
how would I go about doing this?
Starting with a CSV that looks like this:
ID something winner
1 a
2 b
3 c
4 a
5 d
6 a
7 b
8 e
9 f
10 g
You could use the following approach. The whole file is read in, rows are chosen by a randomly selected index, and written back out to the file.
import csv
import random
# Read in the data
with open('example.csv', 'r') as infile:
reader = csv.reader(infile)
header = next(reader) # We want the headers, but not as part of the sample
data = []
for row in reader:
data.append(row)
# Find the column called winner
winner_column_index = header.index('winner')
# Pick some random indices which will be used to generate the sample
all_indices = list(range(len(data)))
sampled_indices = random.sample(all_indices, 5)
# Add the winner column to those rows selected
for index in sampled_indices:
data[index][winner_column_index] = 'Winner'
# Write the data back
with open('example_out.csv', 'w', newline='') as outfile:
writer = csv.writer(outfile)
writer.writerow(header) # Make sure we get the headers back in
writer.writerows(data) # Write the rest of the data
This will give the following output:
ID something winner
1 a
2 b Winner
3 c
4 a Winner
5 d
6 a Winner
7 b
8 e
9 f Winner
10 g Winner
EDIT: It turns out that having the first column of the CSV being called ID is not a good idea if you want to open with Excel. It then incorrectly thinks the file is in SYLK format.
First, why are you using csv and not a db? even an sqlite would be much easier (builtin - import sqlite3)
Second, you'll need to write the whole file again. I suggest you use your lines as lists and just update them (lists are like pointers so you can change the inner values and it will update)
lines=[list(line) for line in source]
and then
for choice in random_choice:
choice[WINNER_INDEX]+=1
and write the file

Loop retrieve data from csv, append to file

I have created a Python 2.7 script that does the following:
Gets a list of filenames from a folder, and writes them to a csv file, one for each row.
And
Enters data into a search box on the web.
Writes the result from the search box into another csv file.
So what I would like now, is for the csv data in (1 ) to act as the input for (2 ).
i.e. for each filename in the csv file, it conducts a search for that cell.
Additionally, instead of just writing the results into a second csv file in (3 ), I would like to append the result into the first csv file – OR generate a new one with both columns.
I can provide the code, but since it's 50 lines already, I've just tried to keep this question descriptive.
Update: Proposed retrieval and append:
with open("file.csv","a+") as f:
r = csv.reader(f)
wr = csv.writer(f, delimiter="\n")
result = []
for line in r:
searchbox = driver.find_element_by_name("searchbox")
searchbox.send_keys(line)
sleep(8)
search_reply = driver.find_element_by_class_name("search_reply")
result = re.findall("((?<=\()[0-9]*)", search_reply.text)
wr.writerow(result)
Open for reading and appending, store the output then write at the end:
import csv
with open("first.csv","a+") as f:
r = csv.reader(f)
wr = csv.writer(f,delimiter="\n")
result = []
for line in r:
# process lines/step 2
# append to result
wr.writerow(result)

Writing for loop outputs to CSV columns

I have a for loop that prints 4 details:
deats = soup.find_all('p')
for n in deats:
print n.text
The output is 4 printed lines.
Instead of printing, what I'd like to do is have each 'n' written to a different column in a .csv. Obviously, when I use a regular .write() it puts it in the same column. In other words, how would I make it write each iteration of the loop to the next column?
You would create the csv row as a loop (or using list comprehension) I will show the explicit loop for ease of reading and you can change it to a single list comprehension line yourself.
row = []
for n in deats:
row.append(n)
Now you have row ready to write to the .csv file using csv.Writer()
Hei, try like this:
import csv
csv_output = csv.writer(open("output.csv", "wb")) # output.csv is the output file name!
csv_output.writerow(["Col1","Col2","Col3","Col4"]) # Setting first row with all column titles
temp = []
deats = soup.find_all('p')
for n in deats:
temp.append(str(n.text))
csv_output.writerow(temp)
You use the csv module for this:
import csv
with open('output.csv', 'wb') as csvfile:
opwriter = csv.writer(csvfile, delimiter=','
opwriter.writerow([n.text for n in deats])
extra_stuff = pie,cake,eat,too
some_file.write(",".join(n.text for n in deats)+"," + ",".join(str(s) for s in extra_stuff))
??? is that all you are looking for?

Categories