Loop retrieve data from csv, append to file - python

I have created a Python 2.7 script that does the following:
Gets a list of filenames from a folder, and writes them to a csv file, one for each row.
And
Enters data into a search box on the web.
Writes the result from the search box into another csv file.
So what I would like now, is for the csv data in (1 ) to act as the input for (2 ).
i.e. for each filename in the csv file, it conducts a search for that cell.
Additionally, instead of just writing the results into a second csv file in (3 ), I would like to append the result into the first csv file – OR generate a new one with both columns.
I can provide the code, but since it's 50 lines already, I've just tried to keep this question descriptive.
Update: Proposed retrieval and append:
with open("file.csv","a+") as f:
r = csv.reader(f)
wr = csv.writer(f, delimiter="\n")
result = []
for line in r:
searchbox = driver.find_element_by_name("searchbox")
searchbox.send_keys(line)
sleep(8)
search_reply = driver.find_element_by_class_name("search_reply")
result = re.findall("((?<=\()[0-9]*)", search_reply.text)
wr.writerow(result)

Open for reading and appending, store the output then write at the end:
import csv
with open("first.csv","a+") as f:
r = csv.reader(f)
wr = csv.writer(f,delimiter="\n")
result = []
for line in r:
# process lines/step 2
# append to result
wr.writerow(result)

Related

How to read a csv file and create a new csv file after every nth number of rows?

I'm trying to write a function that reads a sheet of an existing .csv file and every 20 rows are copied to a newly created csv file. Therefore, it needs to be designed like a file counter "file_01, file_02, file_04,...," where the first 20 rows are copied to file_01, the next 20 to file_02.csv, and so on.
Currently I have this code which hasn't worked for me work so far.
import csv
import os.path
from itertools import islice
N = 20
new_filename = ""
filename = ""
with open(filename, "rb") as file: # the a opens it in append mode
reader = csv.reader(file)
for i in range(N):
line = next(file).strip()
#print(line)
with open(new_filename, 'wb') as outfh:
writer = csv.writer(outfh)
writer.writerow(line)
writer.writerows(islice(reader, 2))
I have attached a file for testing.
https://1drv.ms/u/s!AhdJmaLEPcR8htYqFooEoYUwDzdZbg
32.01,18.42,58.98,33.02,55.37,63.25,12.82,-32.42,33.99,179.53,
41.11,33.94,67.85,57.61,59.23,94.69,19.43,-19.15,21.71,-161.13,
49.80,54.12,72.78,100.74,56.97,128.84,26.95,-6.76,10.07,-142.62,
55.49,81.02,68.93,148.17,49.25,157.32,34.94,5.39,0.44,-123.32,
56.01,112.81,59.27,177.87,38.50,179.63,43.43,18.42,-5.81,-102.24,
50.79,142.87,48.06,-162.32,26.60,-161.21,52.38,34.37,-7.42,-79.64,
41.54,167.36,37.12,-145.93,15.01,-142.84,60.90,57.05,-4.47,-56.54,
30.28,-172.09,27.36,-130.24,5.11,-123.66,66.24,91.12,-0.76,-35.44,
18.64,-153.20,19.52,-114.09,-1.54,-102.96,64.77,131.32,5.12,-21.68,
7.92,-134.07,14.24,-96.93,-3.79,-80.91,57.10,162.35,12.51,-9.21,
-0.34,-113.74,11.80,-78.73,-2.49,-58.46,46.75,-175.86,20.81,2.87,
-4.81,-91.85,11.78,-60.28,0.59,-39.26,35.75,-158.12,29.79,15.71,
-4.76,-68.67,13.79,-43.84,6.82,-24.69,25.27,-141.56,39.05,30.71,
-1.33,-46.42,18.44,-30.23,14.53,-11.95,16.21,-124.45,47.91,50.25,
4.14,-29.61,24.89,-18.02,23.01,0.10,9.59,-106.05,54.46,77.07,
11.04,-15.39,32.33,-6.66,31.92,12.48,6.24,-86.34,55.72,110.53,
18.69,-2.32,40.46,4.57,41.11,26.87,6.07,-65.68,50.25,142.78,
26.94,10.56,49.18,16.67,49.92,45.39,8.06,-46.86,40.13,168.29,
35.80,24.58,58.45,31.99,56.83,70.92,12.96,-31.90,28.10,-171.07,
44.90,41.72,67.41,55.89,59.21,103.94,19.63,-18.67,15.97,-152.40,
-5.41,-77.62,11.40,-63.21,4.80,-29.06,31.33,-151.44,43.00,37.25,
-2.88,-54.38,13.08,-46.00,12.16,-15.86,21.21,-134.62,51.25,59.16,
1.69,-35.73,17.44,-32.01,20.37,-3.78,13.06,-117.10,56.18,88.98,
8.15,-20.80,23.70,-19.66,29.11,8.29,7.74,-98.22,54.91,123.30,
15.52,-7.45,31.04,-8.22,38.22,21.78,5.76,-77.99,47.34,153.31,
23.53,5.38,39.07,2.98,47.29,38.71,6.58,-57.45,36.18,176.74,
32.16,18.76,47.71,14.88,55.08,61.71,9.76,-40.52,23.99,-163.75,
41.27,34.36,56.93,29.53,59.23,92.75,15.53,-26.40,12.16,-145.27,
49.92,54.65,66.04,51.59,57.34,126.97,22.59,-13.65,2.14,-126.20,
55.50,81.56,72.21,90.19,49.88,155.84,30.32,-1.48,-4.71,-105.49,
55.92,113.45,70.26,139.40,39.23,178.48,38.55,10.92,-7.09,-83.11,
50.58,143.40,61.40,172.50,27.38,-162.27,47.25,24.86,-4.77,-60.15,
41.30,167.74,50.34,-166.33,15.74,-143.93,56.21,43.14,-0.54,-38.22,
30.03,-171.78,39.24,-149.48,5.71,-124.87,63.77,70.19,4.75,-24.15,
18.40,-152.91,29.17,-133.78,-1.18,-104.31,66.51,108.81,11.86,-11.51,
7.69,-133.71,20.84,-117.74,-3.72,-82.28,61.95,146.15,20.05,0.65,
-0.52,-113.33,14.97,-100.79,-2.58,-59.75,52.78,172.46,28.91,13.29,
-4.91,-91.36,11.92,-82.84,0.34,-40.12,41.93,-167.91,38.21,27.90,
These are some of the problems with your current solution.
You created a csv.reader object but then you did not use it
You read each line but then you did not store them anywhere
You are not keeping track of 20 rows which was supposed to be your requirement
You created the output file in a separate with block which does not have access anymore to the read lines or the csv.reader object
Here's a working solution:
import csv
inp_file = "input.csv"
out_file_pattern = "file_{:{fill}2}.csv"
max_rows = 20
with open(inp_file, "r") as inp_f:
reader = csv.reader(inp_f)
all_rows = []
cur_file = 1
for row in reader:
all_rows.append(row)
if len(all_rows) == max_rows:
with open(out_file_pattern.format(cur_file, fill="0"), "w") as out_f:
writer = csv.writer(out_f)
writer.writerows(all_rows)
all_rows = []
cur_file += 1
The flow is as follows:
Read each row of the CSV using a csv.reader
Store each row in an all_rows list
Once that list gets 20 rows, open a file and write all the rows to it
Use the csv.writer's writerows method
Use a cur_file counter to format the filename
Every time 20 rows are dumped to a file, empty out the list and increment the file counter
This solution includes the blank lines as part of the 20 rows. Your test file has actually 19 rows of CSV data and 1 row for a blank line. If you need to skip the blank line, just add a simple check of
if not row:
continue
Also, as I mentioned in a comment, I assume that the input file is an actual CSV file, meaning it's a plain text file with CSV formatted data. If the input is actually an Excel file, then solutions like this won't work, because you'll need some special libraries to read Excel files, even if the contents visually looks like CSV or even if you rename the file to .csv.
Without using any special CSV libraries (e.g. csv, though you could, just that I don't know how to use them, however don't think it is necessary for this case), you could:
excel_csv_fp = open(r"<file_name>", "r", encoding="utf-8") # Check proper encoding for your file
csv_data = excel_csv_fp.readlines()
file_counter = 0
new_file_name = ""
new_fp = ""
for line in csv_data:
if line == "":
if new_fp != "":
new_fp.close()
file_counter += 1
new_file_name = "file_" + "{:02d}".format(file_counter) # 1 turns into 01 and 10 turns 10 i.e. remains the same
new_fp = open("<some_path>/" + new_file_name + ".csv", "w", encoding="utf-8") # Makes a new CSV file to start writing to
elif new_fp != "": # Updated code to make sure new_fp is a file pointer and not a string
new_fp.write(line) # Write each line after a space
If you have any questions on any of the code (how it works, why I choose what etc.), just ask in the comments and I'll try to reply as soon as possible.

Writing to CSV file choosing the column to write in

I'm trying to import the data from X (6 in this case) CSV Files containing some data about texts, and putting one specific row from each Document onto a new one, in such a way that they appear next to each other (export from document 1 on column 1, from the second document on row 2 and so on). I've been unsuccessful so far.
# I have a list containing the path to all relevant files
files = ["path1", "path2", ...]
# then I tried cycling through the folder like this
for file in files:
with open(file, "r") as csvfile:
reader = csv.reader(csvfile, delimiter=",")
for row on reader:
# I'm interested in the stuff stored on Column 2
print([row[2]])
# as you can see, I can get the info from the files, but from here
# on, I can't find a way to then write that information on the
# appropiate coloumn of the newly created CSV file
I know how to open a writer, what I don't know is how to write a script that writes the info it fetches from the original 6 documents on a DIFFERENT COLUMN every time a new file is processed.
# I have a list containing the path to all relevant files
files = ["path1", "path2", ...]
newfile = "newpath1"
# then I tried cycling through the folder like this
for file in files:
with open(file, "r") as csvfile:
reader = csv.reader(csvfile, delimiter=",")
with open(newfile, "a") as wcsvfile:
writer = csv.writer(wcsvfile)
for row on reader:
# I'm interested in the stuff stored on Column 2
writer.writerow([row[2]])

Pulling out data from CSV files' specific columns in Python

I need a quick help with reading CSV files using Python and storing it in a 'data-type' file to use the data to graph after storing all the data in different files.
I have searched it, but in all cases I found, there was headers in the data. My data does not header part. They are tab separated. And I need to store only specific columns of the data. Ex:
12345601 2345678#abcdef 1 2 365 places
In this case, as an example, I would want to store only "2345678#abcdef" and "365" in the new python file in order to use it in the future to create a graph.
Also, I have more than 1 csv file in a folder and I need to do it in each of them. The sources I found did not talk about it and only referred to:
# open csv file
with open(csv_file, 'rb') as csvfile:
Could anyone refer me to already answered question or help me out with it?
. . . and storing it in a PY file to use the data to graph after storing all the data in different files . . .
. . . I would want to store only "2345678#abcdef" and "365" in the new python file . . .
Are you sure that you want to store the data in a python file? Python files are supposed to hold python code and they should be executable by the python interpreter. It would be a better idea to store your data in a data-type file (say, preprocessed_data.csv).
To get a list of files matching a pattern, you can use python's built-in glob library.
Here's an example of how you could read multiple csv files in a directory and extract the desired columns from each one:
import glob
# indices of columns you want to preserve
desired_columns = [1, 4]
# change this to the directory that holds your data files
csv_directory = '/path/to/csv/files/*.csv'
# iterate over files holding data
extracted_data = []
for file_name in glob.glob(csv_directory):
with open(file_name, 'r') as data_file:
while True:
line = data_file.readline()
# stop at the end of the file
if len(line) == 0:
break
# splits the line by whitespace
tokens = line.split()
# only grab the columns we care about
desired_data = [tokens[i] for i in desired_columns]
extracted_data.append(desired_data)
It would be easy to write the extracted data to a new file. The following example shows how you might save the data to a csv file.
output_string = ''
for row in extracted_data:
output_string += ','.join(row) + '\n'
with open('./preprocessed_data.csv', 'w') as csv_file:
csv_file.write(output_string)
Edit:
If you don't want to combine all the csv files, here's a version that can process one at a time:
def process_file(input_path, output_path, selected_columns):
extracted_data = []
with open(input_path, 'r') as in_file:
while True:
line = in_file.readline()
if len(line) == 0: break
tokens = line.split()
extracted_data.append([tokens[i] for i in selected_columns])
output_string = ''
for row in extracted_data:
output_string += ','.join(row) + '\n'
with open(output_path, 'w') as out_file:
out_file.write(output_string)
# whenever you need to process a file:
process_file(
'/path/to/input.csv',
'/path/to/processed/output.csv',
[1, 4])
# if you want to process every file in a directory:
target_directory = '/path/to/my/files/*.csv'
for file in glob.glob(target_directory):
process_file(file, file + '.out', [1, 4])
Edit 2:
The following example will process every file in a directory and write the results to a similarly-named output file in another directory:
import os
import glob
input_directory = '/path/to/my/files/*.csv'
output_directory = '/path/to/output'
for file in glob.glob(input_directory):
file_name = os.path.basename(file) + '.out'
out_file = os.path.join(output_directory, file_name)
process_file(file, out_file, [1, 4])
If you want to add headers to the output, then process_file could be modified like this:
def process_file(input_path, output_path, selected_columns, column_headers=[]):
extracted_data = []
with open(input_path, 'r') as in_file:
while True:
line = in_file.readline()
if len(line) == 0: break
tokens = line.split()
extracted_data.append([tokens[i] for i in selected_columns])
output_string = ','.join(column_headers) + '\n'
for row in extracted_data:
output_string += ','.join(row) + '\n'
with open(output_path, 'w') as out_file:
out_file.write(output_string)
Here's another approach using a namedtuple that will help extract selected fields from a csv file and then let you write them out to a new csv file.
from collections import namedtuple
import csv
# Setup named tuple to receive csv data
# p1 to p5 are arbitrary field names associated with the csv file
SomeData = namedtuple('SomeData', 'p1, p2, p3, p4, p5, p6')
# Read data from the csv file and create a generator object to hold a reference to the data
# We use a generator object rather than a list to reduce the amount of memory our program will use
# The captured data will only have data from the 2nd & 5th column from the csv file
datagen = ((d.p2, d.p5) for d in map(SomeData._make, csv.reader(open("mydata.csv", "r"))))
# Write the data to a new csv file
with open("newdata.csv","w", newline='') as csvfile:
cvswriter = csv.writer(csvfile, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
# Use the generator created earlier to access the filtered data and write it out to a new csv file
for d in datagen:
cvswriter.writerow(d)
Original Data in "mydata.csv":
12345601,2345678#abcdef,1,2,365,places
4567,876#def,0,5,200,noplaces
Output Data in "newdata.csv":
2345678#abcdef,365
876#def,200
EDIT 1:
For tab delimited data make the following changes to the code:
change
datagen = ((d.p2, d.p5) for d in map(SomeData._make, csv.reader(open("mydata.csv", "r"))))
to
datagen = ((d.p2, d.p5) for d in map(SomeData._make, csv.reader(open("mydata2.csv", "r"), delimiter='\t', quotechar='"')))
and
cvswriter = csv.writer(csvfile, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
to
cvswriter = csv.writer(csvfile, delimiter='\t', quotechar='"', quoting=csv.QUOTE_MINIMAL)

Unable to read CSV file in a nested loop

import csv
s = open('models.csv')
checkIt = csv.reader(s)
o = open('data.csv')
csv_o = csv.reader(o)
for c in checkIt:
abc = c[0].split(".")
abcd = abc[2]
commodity_type = abcd[6:]
print(commodity_type)
**for csv in csv_o:
print(csv)
print(commodity_type)**
print function is executing only one time, it should execute for 4 time because i have 4 rows in models.csv file.
please give some solution that nested for loop run for according to number of row in models.csv
Try resetting the file pointer that csv_o points to.
for csv in csv_o:
print(csv)
print(commodity_type)
o.seek(0)
That should automatically make the CSV reader begin reading from the start of the file from the next iteration onwards.

Python csv to dictionary with first line as title

I have a file file.csv with some data:
fn,ln,tel
john,doe,023322
jul,dap,024322
jab,sac,0485
I would like to have an array that I can access like this:
file = 'file.csv'
with open(file,'rU') as f:
reader = csv.DictReader(f)
print reader[0].fn
So I would like that it prints the first name from the first record. Unfortunately, I get this error:
ValueError: I/O operation on closed file
How can I get it done so that I don't need to keep the file opened and that I can play with my array. Btw, I don't need to write back in the csv file, I just need to use the data and for that, an array that I can modify would be best.
You need to access the reader *within the with block, not outside of it:
file = 'file.csv'
with open(file,'rU') as f:
reader = csv.DictReader(f)
first_row = next(reader)
print first_row['fn']
As soon as you move code outside the block, the f file object is closed and you cannot obtain rows from the reader anymore. This is kind of the point of the with statement.
If you want to have random access to all rows in the file, convert the reader to a list first:
file = 'file.csv'
with open(file,'rU') as f:
reader = csv.DictReader(f)
all_rows = list(reader)
print all_rows[0]['fn']
The list() call will iterate over the reader, adding each result yielded to the list object until all rows are read. Make sure you have enough memory to hold all those rows.

Categories