Python assistance skipping rows in csv - python

I am trying to write python code that counts the amount of rows in a csv file, but ignores rows containing the text (zzz for example). I have been able to successfully count the rows but I do not know how to write the code so it ignores rows that contain zzz when counting. Any help in this, or at the least pointing me to something to read would be great.
import csv
filename = (r"name")
with open(filename, 'r') as csvf:
csv_csvf = cvs.reader(csvf)
reader = csv.reader(csvf)
lines = len(list(reader))
print(lines)

Related

Import csv: remove filename from column names in first row

I am using Python 3.5. I have several csv files:
The csv files are named according to a fixed structure. They have a fixed prefix (always the same) plus a varying filename part:
099_2019_01_01_filename1.csv
099_2019_01_01_filename2.csv
My original csv files look like this:
filename1-Streetname filename1-ZIPCODE
TEXT TEXT
TEXT TEXT
TEXT TEXT
TEXT TEXT
TEXT TEXT
TEXT TEXT
Street1 2012932
Street2 3023923
filename2-Name filename2-Phone
TEXT TEXT
TEXT TEXT
TEXT TEXT
TEXT TEXT
TEXT TEXT
TEXT TEXT
Name1 2012932
Name2 3023923
I am manipulating these files using the following code (I am reading the csv files from a source folder and writing them to a destination folder. I am skipping certain rows as I do not want to include this information):
I cut off the TEXT rows, as I do not need them:
import csv
skiprows = (1,2,3,4,5,6)
for file in os.listdir(sourcefolder):
with open(os.path.join(sourcefolder,file)) as fp_in:
reader = csv.reader(fp_in, delimiter=';')
rows = [row for i, row in enumerate(reader) if i not in skiprows]
with open(os.path.join(destinationfolder,file), 'w', newline='') as fp_out:
writer = csv.writer(fp_out)
writer.writerows(rows)
(this code works) gives
filename1-Streetname filename1-ZIPCODE
Street1 2012932
Street2 3023923
filename2-Name filename2-Phone
Name1 2012932
Name2 3023923
The first row contains the header. In the header names there is always the filename (however without the 099_2019_01_01_ prefix) plus a "-". The filename ending .csv is missing. I want to remove this "filename-" for each csv file.
The core part now is to get the first row and only for this row to perform a replace. I need to cut off the prefix and the .csv and then perform a general replace. The first replace could be something like this:
Either I could start with a function to cut off the first n signs, as the length is fixed or
According to this solution just use string.removeprefix('099_2019_01_01_')
As I have Python 3.5 I cannot use removeprefix so I try to just simple replace it.
string.replace("099_2019_01_01_","")
Then I need to remove the .csv which is easy:
string.replace(".csv","")
I put this together and I get (string.replace("099_2019_01_01_","")).replace(".csv",""). (Plus at the end the "-" needs to be removed too, see in the code below). I am not sure if this works.
My main problem is now for this csv import code that I do not know how I can manipulate only the first row when reading/writing the csv. So I want to replace this only in the first row. I tried something like this:
import csv
skiprows = (1,2,3,4,5,6)
for file in os.listdir(sourcefolder):
with open(os.path.join(sourcefolder,file)) as fp_in:
reader = csv.reader(fp_in, delimiter=';')
rows = [row for i, row in enumerate(reader) if i not in skiprows]
with open(os.path.join(destinationfolder,file), 'w', newline='') as fp_out:
writer = csv.writer(fp_out)
rows[0].replace((file.replace("099_2019_01_01_","")).replace(".csv","")+"-","")
writer.writerows(rows)
This gives an error as the idea with rows[0] is not working. How can I do this?
(I am not sure if I should try to include this replacing in the code or to put it into a second code which runs after the first code. However, then I would read and write csv files again I assume. So I think it would be most efficient to implement it into this code. Otherwise I need to open and change and save every file again. However, if it is not possible to include it into this code I would be also fine with a code which runs stand-alone and just does the replacing assuming the csv file have the rows 0 as header and then the data comes.)
Please note that I do want to go this way with csv and not use pandas.
EDIT:
At the end the csv files should look like this:
Streetname ZIPCode
Street1 9999
Street2 9848
Name Phone
Name1 23421
Name2 23232
Try by replacing this:
rows[0].replace((file.replace("099_2019_01_01_","")).replace(".csv","")+"-","")
By this in your code:
x=file.replace('099_2019_01_01_','').replace('.csv', '')
rows[0]=[i.replace(x+'-', '') for i in rows[0]]

Python: Trying to extract and output rows from one csv file to another csv file

As per the title, I'm attempting to write a python script to read a csv file, filter through it to see which ones I need and output the filtered rows into a seperate csv file.
So far I am able to read the csv files with:
open('list.csv') as f
csv_f = csv.reader(f)
and I am storing 3 of the rows in a tuple and using it to compare it to another list to see if there is a match. If there is a match I want the row containing the tuple to output to a new csv file.
I have successfully been able to read the files, match the tuples with another list and output which have been matched as text. The problem is I do not know how to then output the rows that match the tuple into a new csv file.
I was thinking to assign a row number to each tuple but that did not go anywhere either.
I want to know the best way I can effectively output the rows I need
Using csv module, this could be a solution more elegant:
with open('input.csv', 'r') as inp, open('output', 'w') as outp:
csv_f = csv.reader(inp)
csv_o = csv.reader(outp)
for line in csv_f:
if line == 'something':
csv_o.writeline(line)
Open both files. Iterate through the lines in the file that you read from and case your condition evaluates to True, then write the line to the output file.
with open('list.csv', 'r') as rf:
with open('output.csv', 'w') as wf:
# Read lines
for read_line in rf:
if <your condition>:
# Write to the file
wf.write(read_line)

How can I open multiple csv files in a folder, take the average of a column and save in a separate file using python?

I am extremely new at python and need some help with this one. I've tried various codes and none seem to work, so suggestions would be awesome.
I have a folder with about 1500 csv files that each contain multiple columns of data. I need to take the average of the first column called "agr" and save this value in a different excel or csv file. It would be great if I could also somehow save the name of the file with its averaged value so that I can keep track of which file it came from. The name of the files are crop_city (e.g. corn_omaha).
import glob
import csv
import numpy as np
import pandas as pd
path = ('C:/test/*.csv')
for fname in glob.glob(path):
with open(fname) as csvfile:
agr = []
reader = csv.DictReader(fname)
print row['agr']
I know the code above is extremely rudimentary, so any help would be great thanks everyone!
Assuming the first column in these CSV files is a decimal or float, you don't really need to parse the entire line. Just split at the first separator and parse the first token. There is no real advantage to numpy or pandas either. Just use the builtin sum function.
import glob
import os
path = ('test/*.csv') # using local dir for test
outfile.write("Filename,Sum\r\n") # header for output
with open('output.csv', 'w', newline='') as outfile:
for fname in glob.glob(path):
with open(fname) as csvfile:
next(csvfile) # skip header
outfile.writelines("{},{}\r\n".format(os.path.basename(fname),
sum(float(line.split(',', 1)[0].strip())
for line in csvfile)))
Contrary to the answer by #tdelaney, I would not advise you to limit your code by relying on the fact that you are adding up the first column; what if you need to work with the third column next week? It's easy to do this properly by building on the code you provide. Parsing a couple of thousand text files is not going to slow you down.
The csv.DictReader constructor will automatically treat the first row of its input as a header (unless you explicitly specify a list of column names with the fieldnames parameter). So your code can look like this:
import csv
import glob
averages = []
for fname in glob.glob(path):
with open(fname, "rb") as csvfile:
reader = csv.DictReader(csvfile)
values = [ float(row["agr"]) for row in reader ]
avg = sum(values) / len(values)
averages.append((fname, avg))
The list averages now contains the numbers you want. This is how you write it out to another CSV file:
with open("avegages.csv", "wb") as outfile:
writer = csv.writer(outfile)
writer.writerow(["File", "Average agr"])
for row in averages:
writer.writerow(row)
PS. Since you included pandas in your imports, here's one way to do the same thing with pandas. However, I recommend sticking with csv for now. The pandas object model is complex, and hard to wrap your head around.
averages = []
for fname in glob.glob(path):
data = pd.DataFrame.from_csv(fname)
averages.append((fname, data["agr"].mean()))
df_out = pd.DataFrame.from_records(averages, columns=["File", "Average agr"])
df_out.to_csv("averages.csv", index=False)
As you can see the code is a lot shorter, since file i/o and calculations can be done with one statement.

csv.writer.writerows missing rows from list

I'm new to python.
I have a list with 19188 rows that I want to save as a csv.
When I write the list's rows to the csv, it does not have the last rows (it stops at 19112).
Do you have any idea what might cause this?
Here is how I write to the csv:
mycsvfile = open('file.csv', 'w')
thedatawriter = csv.writer(mycsvfile, lineterminator = '\n')
list = []
#list creation code
thedatawriter.writerows(list)
Each row of list has 4 string elements.
Another piece of information:
If I create a list that contains only the last elements that are missing and add them to the csv file, it kind of works (it is added, but twice...).
mycsvfile = open('file.csv', 'w')
thedatawriter = csv.writer(mycsvfile, lineterminator = '\n')
list = []
#list creation code
thedatawriter.writerows(list)
list_end = []
#list_end creation code
thedatawriter.writerows(list_end)
If I try to add the list_end alone, it doesn't seem to be working. I'm thinking there might be a csv writing parameter that I got wrong.
Another piece of information:
If I open the file adding ", newline=''", then it write more rows to it (though not all)
mycsvfile = open('file.csv', 'w', newline='')
There must be a simple mistake in the way I open or write to the csv (or in the dialect?)
Thanks for your help!
I found my answer! I was not closing the filehandle before script end which left unwritten rows.
Here is the fix:
with open('file.csv', 'w', newline='') as mycsvfile:
thedatawriter = csv.writer(mycsvfile, lineterminator = '\n')
thedatawriter.writerows(list)
See: Writing to CSV from list, write.row seems to stop in a strange place
Close the filehandle before the script ends. Closing the filehandle
will also flush any strings waiting to be written. If you don't flush
and the script ends, some output may never get written.
Using the with open(...) as f syntax is useful because it will close
the file for you when Python leaves the with-suite. With with, you'll
never omit closing a file again.

How to extract data from rows in .csv file into separate .txt files using python?

I have a CSV file of interview transcripts exported from an h5 file. When I read the rows into python, the output looks something like this:
line[0]=['title,date,responses']
line[1]=['[\'Transcript 1 title\'],"[\' July 7, 1997\']","[ '\nms. vogel: i look at all sectors of insurance, although to date i\nhaven\'t really focused on the reinsurers and the brokers.\n']']
line[2]=['[\'Transcript 2 title\'],"[\' July 8, 1997\']","[ '\nmr. tozzi: i formed cambridge in 1981. we are top-down sector managers,\nconstantly searching for non-consensus companies and industries.\n']']
etc...
I'd like to extract the text from the "responses" column ONLY into separate .txt files for every row in the CSV file, saving the .txt files into a specified directory and naming them as "t1.txt", "t2.txt", etc. according to the row number. The CSV file has roughly 30K rows.
Drawing from what I've already been able to find online, this is the code I have so far:
import csv
with open("twst.csv", "r") as f:
reader = csv.reader(f)
rownumber = 0
for row in reader:
g=open("t"+str(rownumber)+".txt","w")
g.write(row)
rownumber = rownumber + 1
g.close()
My biggest problem is that this pulls all columns from the row into the .txt file, but I only want the text from the "responses" column. Once I have that, I know I can loop through the various rows in the file (right now, what I have set up is just to test the first row), but I haven't found any guidance on pulling specific columns in the python documentation. I'm also not familiar enough with python to figure out the code on my own.
Thanks in advance for the help!
There may be something that can be done with the built-in csv module. However, if the format of the csv does not change, the following code should work by just using for loops and built-in read/write.
with open('test.csv', 'r') as file:
data = file.read().split('\n')
for row in range(1, len(data)):
third_col= data[x].split(',')
with open('t' + str(x) + '.txt', 'w') as output:
output.write(third_col[2])

Categories