accessing the values of collections.defaultdict - python

I have a csv file that I want to read column wise, for that I've this code :
from collections import defaultdict
from csv import DictReader
columnwise_table = defaultdict(list)
with open("Weird_stuff.csv",'rU') as f:
reader = DictReader(f)
for row in reader:
for col,dat in row.items():
columnwise_table[col].append(dat)
#print(columnwise_table.items()) # this gives me everything
print(type(columnwise_table[2]) # I'm look for smt like this
my question is how can get all the element of only one specific column ? and I'm not using conda and the matrix is big 2400x980
UPDATE
I have 980 columns and over 2000 rows I need to work with the file using the columns say 1st column[0]: feature1 2nd column[0]: j_ss01 50th column:Abs2 and so on
since I can't access the dict using the column names I would like to use an index for that. is this possible ?

import csv
import collections
col_values = collections.defaultdict(list)
with open('Wierd_stuff.csv', 'rU') as f:
reader = csv.reader(f)
# skip field names
next(reader)
for row in csv.reader(f):
for col, value in enumerate(row):
col_values[col].append(value)
# for each numbered column you want...
col_index = 33 # for example
print(col_values[col_index])
If you know the columns you want in advance, only storing those columns could save you some space...
cols = set(1, 5, 6, 234)
...
for col, value in enumerate(row):
if col in cols:
col_values[col].append(value)

By iterating on row.items, you get all columns.
If you want only one specific column via index number, use csv.reader and column index instead.
from csv import reader
col_values = []
# Column index number to get values from
col = 1
with open("Weird_stuff.csv",'rU') as f:
reader = reader(f)
for row in reader:
col_val = row[col]
col_values.append(col_val)
# contains only values from column index <col>
print(col_values)

Related

Skip Rows in CSV Containing Specific String

I have a list of strings (longer than in this example). If one of the strings exists in a row of data, I want to skip that row. This is what I have so far but I get an index error, which leads me to believe I'm not looping correctly.
stringList = ["ABC", "AAB", "AAA"]
with open('filename.csv', 'r')as csvfile:
filereader = csv.reader(csvfile, delimiter=',')
next(filereader, None) #Skip header row
for row in filereader:
for k in stringList:
if k not in row:
data1 = column[1]
The error I get: IndexError: list index out of range. I realize I'm reading by row, but I need to extract the data by column.
The error is because row is a list and you are using/accessing it as a normal variable.
You can access certain columns by using appropriate indexing of the list row. Eg: in the first iteration row[0] will be the element in the first-row first-column, row[1] the second column entry and so on. On subsequent iterations of row, you can access entries of subsequent column downwards.
Here's a simple loop to do it.
for row in filereader:
for k in stringList:
for i in range(len(row)):
if k not in row[i]:
someVar=row[i]
With pandas you can do it easily, with a mask. See more: link
import pandas as pd
data = pd.read_csv('filename.csv')
data = data.loc[data['column_name'] not in stringList]

How to get the values occurring only once in first column of a csv file using python

I am new in python so I'm trying to read a csv with 700 lines included a header, and get a list with the unique values of the first csv column.
Sample CSV:
SKU;PRICE;SUPPLIER
X100;100;ABC
X100;120;ADD
X101;110;ABV
X102;100;ABC
X102;105;ABV
X100;119;ABG
I used the example here
How to create a list in Python with the unique values of a CSV file?
so I did the following:
import csv
mainlist=[]
with open('final_csv.csv', 'r', encoding='utf-8') as csvf:
rows = csv.reader(csvf, delimiter=";")
for row in rows:
if row[0] not in rows:
mainlist.append(row[0])
print(mainlist)
I noticed that in debugging, rows is 1 line not 700 and I get only the
['SKU'] field what I did wrong?
thank you
A solution using pandas. You'll need to call the unique method on the correct column, this will return a pandas series with the unique values in that column, then convert it to a list using the tolist method.
An example on the SKU column below.
import pandas as pd
df = pd.read_csv('final_csv.csv', sep=";")
sku_unique = df['SKU'].unique().tolist()
If you don't know / care for the column name you can use iloc on the correct number of column. Note that the count index starts at 0:
df.iloc[:,0].unique().tolist()
If the question is intending get only the values occurring once then you can use the value_counts method. This will create a series with the index as the values of SKU with the counts as values, you must then convert the index of the series to a list in a similar manner. Using the first example:
import pandas as pd
df = pd.read_csv('final_csv.csv', sep=";")
sku_counts = df['SKU'].value_counts()
sku_single_counts = sku_counts[sku_counts == 1].index.tolist()
If you want the unique values of the first column, you could modify your code to use a set instead of a list. Maybe like this:
import collections
import csv
filename = 'final_csv.csv'
sku_list = []
with open(filename, 'r', encoding='utf-8') as f:
csv_reader = csv.reader(f, delimiter=";")
for i, row in enumerate(csv_reader):
if i == 0:
# skip the header
continue
try:
sku = row[0]
sku_list.append(sku)
except IndexError:
pass
print('All SKUs:')
print(sku_list)
sku_set = set(sku_list)
print('SKUs after removing duplicates:')
print(sku_set)
c = collections.Counter(sku_list)
sku_list_2 = [k for k, v in c.items() if v == 1]
print('SKUs that appear only once:')
print(sku_list_2)
with open('output.csv', 'w') as f:
for sku in sorted(sku_set):
f.write('{}\n'.format(sku))
A solution using neither pandas nor csv :
lines = open('file.csv', 'r').read().splitlines()[1:]
col0 = [v.split(';')[0] for v in lines]
uniques = filter(lambda x: col0.count(x) == 1, col0)
or, using map (but less readable) :
col0 = list(map(lambda line: line.split(';')[0], open('file.csv', 'r').read().splitlines()[1:]))
uniques = filter(lambda x: col0.count(x) == 1, col0)

How to find specific row in Python CSV module

I need to find the third row from column 4 to the end of the a CSV file. How would I do that? I know I can find the values from the 4th column on with
row[3]
but how do I get specifically the third row?
You could convert the csv reader object into a list of lists... The rows are stored in a list, which contains lists of the columns.
So:
csvr = csv.reader(file)
csvr = list(csvr)
csvr[2] # The 3rd row
csvr[2][3] # The 4th column on the 3rd row.
csvr[-4][-3]# The 3rd column from the right on the 4th row from the end
You could keep a counter for counting the number of rows:
counter = 1
for row in reader:
if counter == 3:
print('Interested in third row')
counter += 1
You could use itertools.islice to extract the row of data you wanted, then index into it.
Note that the rows and columns are numbered from zero, not one.
import csv
from itertools import islice
def get_row_col(csv_filename, row, col):
with open(csv_filename, 'rb') as f:
return next(islice(csv.reader(f), row, row+1))[col]
This one is a very basic code that will do the job and you can easily make a function out of it.
import csv
target_row = 3
target_col = 4
with open('yourfile.csv', 'rb') as csvfile:
reader = csv.reader(csvfile)
n = 0
for row in reader:
if row == target_row:
data = row.split()[target_col]
break
print data

Python csv row count using column name

I have a csv file with 'n' columns. I need to get the rowcount of
each column using the column name and give out a dictionary of the following format:
csv_dict= {col_a:10,col_b:20,col_c:30}
where 10,20 and 30 are the row count of col a, b and c respectively.
I obtained a list of columns using fieldnames option of Dictreader.
Now i need the row count of every column in my list.
This is what I tried:
for row in csv.DictReader(filename):
col_count= sum(1 for row['col_a'] in re)+1
This just gets the row count of column a. How to get the row counts of all the columns in my list
and put them in a dictionary in the above mentioned format? Any help appreciated. Thanks and regards.
You can try this:#Save this file with FileName.csv
Name,age,DOB
abhijeet,17,17/09/1990
raj,17,7/09/1990
ramesh,17,17/09/1990
rani,21,17/09/1990
mohan,21,17/09/1990
nil,25,17/09/1990
#Following is the python code.
import csvfrom collections import defaultdictcolumns = defaultdict(list) # each value in each column is appended to a listwith open('FileName.csv') as f: reader = csv.DictReader(f) # read rows into a dictionary format for row in reader: # read a row as {column1: value1, column2: value2,...} for (k,v) in row.items(): # go over each column name and value if not v=='': columns[k].append(v) # append the value into the appropriate list # based on column name kprint len(columns['Name']) #print the length of the specified columnprint len(columns['age']) #print the length of the specified columnprint len(columns['DOB']) #print the length of the specified column
I would use pandas!
# FULLNAME= path/filename.extension of CSV file to read
data = pd.read_csv(FULLNAME, header=0)
# counting empty values
nan_values = data.isnull().sum()
# multiply by -1
ds = nan_values.multiply(-1)
# add total of rows from CSV
filled_rows = ds.add(len(data))
# create dict from data series
csv_dict = filled_rows.to_dict()
If you want to preserve column name order, use an OrderedDict
csv_dict_ordered = OrderedDict()
for idx in filled_rows.index:
csv_dict_ordered[idx] = filled_rows[idx]

Dynamically remove a column from a CSV

I want to dynamically remove a column from a CSV, this is what I have so far. I have no idea where to go from here though:
# Remove column not needed.
column_numbers_to_remove = 3,2,
file = upload.filepath
#I READ THE FILE
file_read = csv.reader(file)
REMOVE 3 and 2 column from the CSV
UPDATE SAVE CSV
Use enumerate to get the column index, and create a new row without the columns you don't want... eg:
for row in file_read:
new_row = [col for idx, col in enumerate(row) if idx not in (3, 2)]
Then write out your rows using csv.writer somewhere...
Read the csv and write into another file after removing the columns.
import csv
creader = csv.reader(open('csv.csv'))
cwriter = csv.writer(open('csv2.csv', 'w'))
for cline in creader:
new_line = [val for col, val in enumerate(cline) if col not in (2,3)]
cwriter.writerow(new_line)

Categories