I'm trying to count the commas row by row in a .csv file. Unfortunately it always comes up to zero.
import csv
with open('Test.csv') as csv_file:
csv_reader = csv.reader(csv_file, delimiter=',')
for row in csv_reader:
numCommas = row.read().count(',')
print numCommas
But I am always getting 0.
If you just want to count the commas and don't need the data, the csv module is not required:
with open('Test.csv', 'r') as csv_file:
for line in csv_file:
print(line.count(','))
On python2 you can try in this way, but you will have to change the delimiter:
csv_reader = csv.reader(csv_file, delimiter='\t')
numCommas = row[0].count(',')
If you have the delimiter as , this is how the row looks like:
['a', 'd', 'f', 'g', 'h']
With \t as delimiter, the row is in this way:
['a,d,f,g,h']
So in this way you can have the number of commas for each row and not the total count
Just read the file and count ','
with open('Test.csv') as csv_file:
count = csv_file.read().count(',')
Code you have shared will give error, because in your for loop row will be list. And list object don't have attribute 'read'.
And you have used csv.reader, so it will give you each row in form of a list. So when you are iterating csv_reader object in for loop row variable will be of type list.
If you want to count number of columns in each row you can simply print len(row) inside for loop.
print len(row)
But if you want to count number of commas only, then you need to read file without using csv.reader.
From your example above the row in csv_reader is a list. It's not a string separated by comma. When you read the file through csv.reader() you breaking the rows down to each columns and storing them into a list and then it's put inside csv reader object.
For your purposes, may be you can simply use len(row) if you want the count of columns or items in the row.
It comes up with zero because the csv file is separated with the delimiter and the "row" variable is a list object.
You can get the count of comma(s) in str object like this:
a = "a,b,b,"
print(a.count(","))
This cannot be applied for the list, so you need to apply count for individual entries in your "row" variable to get the count of commas(if you need count for each entry in row) or you can read the file as a text file. and get the commas by readlines method.
csv_reader = csv.reader(csv_file, delimiter=',')
This line separates the elements by ',' and return a list. For instance, a csv row could be
100, 200, 300, 400
The list returned would be [100, 200, 300, 400]
To count the comma, subtract 1 from the number of elements. i.e. 4 elements = 3 commas
FIXED VERSION:
with open('Test.csv') as csv_file:
csv_reader = csv.reader(csv_file, delimiter=',')
for row in csv_reader:
print (len(row)-1)
Related
I have raw data.
I want to split this into csv/excel.
after that if the data in the rows are not correctly stored( for e.g. if 0 is there entered instead of 121324) I want python to identify those rows.
I mean while splitting raw data into csv through python code, some rows might form incorrectly( please understand).
How to identify those rows through python?
example:
S.11* N. ENGLAND L -8' 21-23 u44'\n
S.18 TAMPA BAY W -7 40-7 u49'\n
S.25 Buffalo L -4' 18-33 o48
result i want:
S,11,*,N.,ENGLAND,L,-8',21-23,u44'\n
S,18,,TAMPA,BAY,W,-7,40-7,u49'\n
S,25,,Buffalo,L,-4',18-33,o48\n
suppose the output is like this:
S,11,N.,ENGLAND,L,-8',21-23u,44'\n
S,18,,TAMPA,BAY,W,-7,40-7,u49'\n
S,25,,Buffalo,L,-4',18-33,o48\n
you can see that in first row * is missing and u44' is stored as only 44. and u is append with another column.
this row should be identified by python code and should return me this row.
likewise i want all rows those with error.
this is what i have done so far.
import csv
input_filename = 'rawsample.txt'
output_filename = 'spreads.csv'
with open(input_filename, 'r', newline='') as infile:
open(output_filename, 'w', newline='') as outfile:
reader = csv.reader(infile, delimiter=' ', skipinitialspace=True)
writer = csv.writer(outfile, delimiter=',')
for row in reader:
new_cols = row[0].split('.')
if not new_cols[1].endswith('*'):
new_cols.extend([''])
else:
new_cols[1] = new_cols[1][:-1]
new_cols.extend(['*'])
row = new_cols + row[1:]
#print(row)
writer.writerow(row)
er=[]
for index, row in df.iterrows():
for i in row:
if str(i).lower()=='nan' or i=='':
er.append(row)
# i was able to check for null values but nothing more.
please help me.
#mozway is right you better give an example input and expected result.
Anyway if you're dealing with a variable number of columns in the input please refer to Handling Variable Number of Columns with Pandas - Python
Best
i have a .csv file trying to make it in a dict. I tried pandas and csv.DictReader mostly but until now i can print the data (not in the way i want) with the DictReader.
So the main problem is that the file is like
header;data (1 column)
for about 50 rows and after that it changes the schema like
header1;header2;header3;header4
in row 50 and row 50+
data1;data2;data3;data4 etc..
with open(filename, 'r', encoding='utf-16') as f:
for line in csv.DictReader(f):
print(line)
thats the code i have for now.
Thanks for your help.
You can't use DictReader for this, because it requires all the rows to have the same fields.
Use csv.reader and check the length of the row that it returns. When the length changes, treat that as a new header.
Hopefully you don't have adjacent sections of the file that have the same number of fields but different headers. It will be difficult for the script to detect when the section changes.
data = []
with open(filename, 'r', encoding='utf-16') as f:
r = csv.reader(f, delimiter=';')
# process first 52 rows in format header;data
for _ in range(52):
row = next(r)
data.append({row[0]: row[1]})
# rest of file is a header row followed by variable number of data rows
header = next(r)
for row in r:
if len(row) != len(header): # new header
header = row
continue
d = dict(zip(header, row))
data.append(d)
I have a csv with two fields, 'positive' and 'negative'. I am trying to add the positive words to a list from the csv using the DictReader() module. Here is the following code.
import csv
with open('pos_neg_cleaned.csv', 'r') as csv_file:
csv_reader = csv.DictReader(csv_file)
positive_list = []
for n in csv_reader:
if n == 'positive' and csv_reader[n] != None :
positive_list.append(csv_reader[n])
However the program returns an empty list. Any idea how to get around this issue? Or what am I doing wrong?
That's because you can only read once from the csv_reader generator. In this case your do this with the print statement.
With a little re-arranging it should work fine:
import csv
with open('pos_neg_cleaned.csv', 'r') as csv_file:
csv_reader = csv.DictReader(csv_file)
positive_list = []
for n in csv_reader:
# put your print statement inside of the generator loop.
# otherwise the generator will be empty by the time your run the logic.
print(n)
# as n is a dict, you want to grab the right value from that dict.
# if it contains a value, then do something with it.
if n['positive']:
# Here you want to call the value from your dict.
# Don't try to call the csv_reader - but use the given data.
positive_list.append(n['positive'])
Every row in DictReader is a dictionary, so you can retrieve "columns values" using column name as "key" like this:
positive_column_values = []
for row in csv_dict_reader:
positive_column_value = row["positive"]
positive_column_values.append(positive_column_value)
After execution of this code, "positive_column_values" will have all values from "positive" column.
You can replace this code with your code to get desired result:
import csv
with open('pos_neg_cleaned.csv', 'r') as csv_file:
csv_reader = csv.DictReader(csv_file)
positive_list = []
for row in csv_reader:
positive_list.append(row["positive"])
print(positive_list)
Here's a short way with a list comprehension. It assumes there is a header called header that holds (either) positive or negative values.
import csv
with open('pos_neg_cleaned.csv', 'r') as csv_file:
csv_reader = csv.DictReader(csv_file)
positive_list = [line for line in csv_reader if line.get('header') == 'positive']
print(positive_list)
alternatively if your csv's header is positive:
positive_list = [line for line in csv_reader if line.get('positive')]
Now I am writing some data into a csv file. I directly write a list to a row of a csv file, like below:
with open("files/data.csv", "wb") as f_csv:
writer = csv.writer(f_csv,delimiter = ',')
writer.writerow(flux_inteplt) ## here flux_inteplt is a list
But when I read the data like below:
with open('files/data.csv','rb') as csvfile:
reader = csv.reader(csvfile, delimiter=' ')
for row in reader:
parts = row.split(",")
print parts[0]
It has some problem AttributeError: 'list' object has no attribute 'split'
Does anyone has some idea how to approach to this problem?
import csv
with open('us-cities.csv','rb') as csvfile:
reader = csv.reader(csvfile, delimiter=' ')
for row in reader:
str1 = ''.join(row) #Convert list into string
parts = str1.split(",")
print parts[0]
row is already a list, when you iterate over the reader object you get a list of values split by the delimiter you pass, just use each row:
for row in reader:
print row[0] # first element from each row
If you have comma separated values use delimiter=',' not delimiter=' ', which based on the fact you use csv.writer(f_csv,delimiter = ',') when writing means you have. The delimiter you pass when writing is what is used to delimit each element from your input iterable so when reading you need to use the same delimiter if you want to get the same output.
row is already a list. No need to split (:
I have a CSV file with 100 rows.
How do I read specific rows?
I want to read say the 9th line or the 23rd line etc?
You could use a list comprehension to filter the file like so:
with open('file.csv') as fd:
reader=csv.reader(fd)
interestingrows=[row for idx, row in enumerate(reader) if idx in (28,62)]
# now interestingrows contains the 28th and the 62th row after the header
Use list to grab all the rows at once as a list. Then access your target rows by their index/offset in the list. For example:
#!/usr/bin/env python
import csv
with open('source.csv') as csv_file:
csv_reader = csv.reader(csv_file)
rows = list(csv_reader)
print(rows[8])
print(rows[22])
You simply skip the necessary number of rows:
with open("test.csv", "rb") as infile:
r = csv.reader(infile)
for i in range(8): # count from 0 to 7
next(r) # and discard the rows
row = next(r) # "row" contains row number 9 now
You could read all of them and then use normal lists to find them.
with open('bigfile.csv','rb') as longishfile:
reader=csv.reader(longishfile)
rows=[r for r in reader]
print row[9]
print row[88]
If you have a massive file, this can kill your memory but if the file's got less than 10,000 lines you shouldn't run into any big slowdowns.
You can do something like this :
with open('raw_data.csv') as csvfile:
readCSV = list(csv.reader(csvfile, delimiter=','))
row_you_want = readCSV[index_of_row_you_want]
May be this could help you , using pandas you can easily do it with loc
'''
Reading 3rd record using pandas -> (loc)
Note : Index start from 0
If want to read second record then 3-1 -> 2
loc[2]` -> read second row and `:` -> entire row details
'''
import pandas as pd
df = pd.read_csv('employee_details.csv')
df.loc[[2],:]
Output :