Creating a versatile data class in Python - python

I'm hoping you good folks can help with a project I'm working on. Essentially, I am trying to create a class that will take as an input a CSV file, examine the file for the number of columns of data, and store that data in key, value pairs in a dictionary. The code I have up to this point is below:
import csv
class DataStandard():
'''class to store and examine columnar data saved as a csv file'''
def __init__(self, file_name):
self.file_name = file_name
self.full_data_set = {}
with open(self.file_name) as f:
reader = csv.reader(f)
# get labels of each column in list format
self.col_labels = next(reader)
# find the number of columns of data in the file
self.number_of_cols = len(self.col_labels)
# initialize lists to store data using column label as key
for label in self.col_labels:
self.full_data_set[label] = []
The piece I am having a hard time with is once the dictionary (full_data_set) is created I'm not sure how to loop through the remainder of the CSV file and store the data in the respective values for each key (column). Everything I have tried until now hasn't worked because of how I have to loop through the csv.reader object.
I hope this question makes sense, but please feel free to ask any clarifying questions. Also, if you think of an approach that may work in a better more pythonic way I would appreciate the input. This is one of my first self-guided projects on class, so the subject is fairly new to me. Thanks in advance!

To read rows you can use for row in reader
data = []
with open('test.csv') as f:
reader = csv.reader(f)
headers = next(reader)
for row in reader:
d = dict(zip(headers, row))
#print(d)
data.append(d)
print('data:', data)
As said #PM2Ring csv has DictReader
with open('test.csv') as f:
reader = csv.DictReader(f)
data = list(reader)
print('data:', data)

This might give you ideas towards a solution. It is assumed that the labels are only on row 1, and the rest is data, and then the row length becomes 0 when there is no data:
import csv
class DataStandard():
'''class to store and examine columnar data saved as a csv file'''
def __init__(self, file_name):
self.file_name = file_name
self.full_data_set = {}
#modify method to the following:
with open(self.file_name) as f:
reader = csv.reader(f)
for row in reader:
if row = 0:
# get labels of each column in list format
self.col_labels = next(reader)
# find the number of columns of data in the file
self.number_of_cols = len(self.col_labels)
# initialize lists to store data using column label as key
for label in self.col_labels:
self.full_data_set[label] = []
else:
if len(row) != 0:
for i in range(self.number_of_cols):
label = self.col_labels[i]
self.full_data_set[label] = next(reader)
...My one concern is that while the 'with open(...)' is valid, some levels of indentation can be ignored, from my experience. In that case, to reduce the number of indentations, I would just separate 'row=0' and 'row!=0' operations into different instances of 'with open(...)' i.e. do row 1, close, open again, do row 2.

Related

Parse csv file that has subtables of unequal dimensions

I Have a csv file with the following sample data:
[Network]
Network Settings
RECORDNAME,DATA
UTDFVERSION,8
Metric,0
yellowTime,3.5
allRedTime,1.0
Walk,7.0
DontWalk,11.0
HV,0.02
PHF,0.92
[Nodes]
Node Data
INTID,TYPE,X,Y,Z,DESCRIPTION,CBD,Inside Radius,Outside Radius,Roundabout Lanes,Circle Speed
1,1,111152,12379,0,,,,,,
2,1,134346,12311,0,,,,,,
3,3,133315,12317,0,,,,,,
4,1,133284,13574,0,,,,,,
I need help figuring out how to place this into two separate tables using python. I have the following code so far, but I get a keyerror on "if row['RECORDNAME'] == 'Network Settings':" when I try to use it.
# Open the file
with open('filename.csv', 'r') as f:
# Create a reader
reader = csv.DictReader(f)
# Initialize empty lists for the tables
network_table = []
nodes_table = []
# Loop through the rows
for row in reader:
# Check if the row contains the "RECORDNAME" key
if 'RECORDNAME' in row:
# Check if the row belongs to the "Network" or "Nodes" section
if row['RECORDNAME'] == 'Network Settings':
# Add the row to the "Network" table
network_table.append(row)
elif row['INTID'] is not None:
# Add the row to the "Nodes" table
nodes_table.append(row)
# Print the tables
print(network_table)
print(nodes_table)
Any suggestions would be appreciated.
this is one way to solve this, read the file for one table, and then store that data. comment added to code, hope they helps.
# Gist - read the rows, and determine when the data starts for the [![enter image description here][1]][1]next category
import csv
import re
result={} # store in the list , each item as one type of data
with open('node.csv', 'r') as csvfile:
csvreader = csv.reader(csvfile, delimiter=',')
for e in csvreader:
if re.search('\[.*\]',''.join(e)): # config found pattern [....] example [Network]
table_data=True
key_dict=next(csvreader) # get the key, so know the following data below to that category
key_dict=''.join(key_dict)
key_data=[]
while table_data:
try:
detail_data=next(csvreader)
except:
table_data=False # handling the eof
if not ''.join(detail_data):
table_data=False
key_data.append(detail_data)
result[key_dict]=key_data # store the data
each table can be accessed like result['Network Settings'].

Reading columns of data for every point in a csv file in Python

I want to read the second column of data with the title nodes and assign to a variable with the same name for each point of t1.
import csv
with open('Data_10x10.csv', 'r') as f:
csv_reader = csv.reader(f)
The data looks like
csv_reader = csv.reader(f) is a Generator. So you can skip the headers by execute heading = next(csv_reader).
I would just use a dictionary data_t1 for storing node data with key name of column t1.
Try below one.
import csv
with open('Data_10x10.csv', 'r') as f:
data_t1={}
csv_reader = csv.reader(f)
# Skips the heading
heading = next(csv_reader)
for row in csv_reader:
data_t1[row[0]] = row[1]
Accessing data (key should be value of you t1 column, in this case '0', '1' etc.)
print(data_t1['0'])
print(data_t1['1'])
If you want to create dynamic variables with the same name for each point of t1 It is really bad idea. If your csv has lot of rows maybe millions, it will create millions of variables. So use dictionary with key and values.

Updating a specific csv column based on randomname

My code pulls a random name from a csv file. When a button is pressed i want my code to search through the csv file, and update the cell next to the name generated previously in the code.
The variable in which the name is stored in is called name
The index which pulls the random name from the csv file is stored in the variable y
The function looks like this. I have asked this question previously however have had no luck in receiving answers, so i have made edits to the function and hopefully made it more clear.
namelist_file = open('StudentNames&Questions.csv')
reader = csv.reader(namelist_file)
writer = csv.writer(namelist_file)
rownum=0
array=[]
for row in reader:
if row == name:
writer.writerow([y], "hello")
Only the first two columns of the csv file are relevant
This is the function which pulls a random name from the csv file.
def NameGenerator():
namelist_file = open('StudentNames&Questions.csv')
reader = csv.reader(namelist_file)
rownum=0
array=[]
for row in reader:
if row[0] != '':
array.append(row[0])
rownum=rownum+1
length = len(array)-1
i = random.randint(1,length)
global name
name = array[i]
return name
There are a number of issues with your code:
You're trying to have both a reader object and a writer on the same file at the same time. Instead, you should read the file contents in, make any changes necessary and then write the whole file back out at the end.
You need to open the file in write mode in order to actually make changes to the contents. Currently, you don't specify what mode you're using so it defaults to read mode.
row is actually a list representing all data in the row. Therefore, it cannot be equal to the name you're searching, only the 0th index might be.
The following should work:
with open('StudentNames&Questions.csv', 'r') as infile:
reader = csv.reader(infile)
data = [row for row in reader]
for row in data:
if row[0] == name:
row[1] += 1
with open('StudentNames&Questions.csv', 'w', newline='') as outfile:
writer = csv.writer(outfile)
writer.writerows(data)

What is the best way to overwrite a specific row in a csv by its index in Python 2.7

I have a python script that appends 4 strings to the end of my csv file. The first column is the user's email address, and I want to search the csv to see if that users email address is already in the file, if it is I want to overwrite that whole row with my 4 new strings, but if not I want to continue to just append it to the end. I have it searching the first column for the email, and if it is there it will give me the row.
with open('Mycsvfile.csv', 'rb') as f:
reader = csv.reader(f)
indexLoop = []
for i, row in enumerate(reader):
if userEmail in row[0]:
indexLoop.append(i)
f.close()
with open("Mycsvfile.csv", 'ab') as file222:
writer = csv.writer(file222, delimiter=',')
lines = (userEmail, userDate, userPayment, userStatus)
writer.writerow(lines)
file222.close()
I want to do something like this, if email is in row it will give me the row index and I can use that to overwrite the whole row with my new data. If it isn't there I will just append the file at the bottom.
Example:
with open('Mycsvfile.csv', 'rb') as f:
reader = csv.reader(f)
new_rows = []
indexLoop = []
for i, row in enumerate(reader):
if userEmail in row[0]:
indexLoop.append(i)
new_row = row + indexLoop(userEmail, userDate, userPayment, userStatus)
new_rows.append(new_row)
else:
print "userEmail doesn't exist"
#(i'd insert my standard append statement here.
f.close
#now open csv file and writerows(new_row)
For this, you're better off using Pandas, rather than the csv module. That way you can read the whole file into memory, modify it, and then write it back to a file.
Be aware though that, modify DataFrames in place is slow, so if you have a lot of data to add, you're better of transforming it in into a dictionary and back.
import pandas as pd
file_path = r"/Users/tob/email.csv"
columns = ["email", "foo", "bar", "baz"]
df = pd.read_csv(file_path, header=None, names=columns, index_col="email")
data = df.to_dict('index')
for email, foo, bar, baz in information:
row = {"foo": foo, "bar": bar, "baz"}
data[email] = row
df = pd.DataFrame(data)
df.to_csv(file_path)
Where information is whatever your script returned.
First you don't need to call the close function when using with, python does it for you.
If you have the index you can do:
with open("myFile.csv", "r+") as f:
# gives you a list of the lines
contents = f.readlines()
# delete the old line and insert the new one
contents.pop(index)
contents.insert(index, value)
# join all lines and write it back
contents = "".join(contents)
f.write(contents)
But I would recommand you to do all the operations in one function because it doesn't make a lot of sense to open the file, iterate on its lines, close it, reopen it and updating it.

Appending data to csv file

I am trying to append 2 data sets to my csv file. Below is my code. The code runs but my data gets appended below a set of data in the first column (i.e. col[0]). I would however like to append my data sets in separate columns at the end of file. Could I please get advice on how I might be able to do this? Thanks.
import csv
Trial = open ('Trial_test.csv', 'rt', newline = '')
reader = csv.reader(Trial)
Trial_New = open ('Trial_test.csv', 'a', newline = '')
writer = csv.writer(Trial_New, delimiter = ',')
Cortex = []
Liver = []
for col in reader:
Cortex_Diff = float(col[14])
Liver_Diff = float(col[17])
Cortex.append(Cortex_Diff)
Liver.append(Liver_Diff)
Avg_diff_Cortex = sum(Cortex)/len(Cortex)
Data1 = str(Avg_diff_Cortex)
Avg_diff_Liver = sum(Liver)/len(Liver)
Data2 = str(Avg_diff_Liver)
writer.writerows(Data1 + Data2)
Trial.close()
Trial_New.close()
I think I see what you are trying to do. I won't try to rewrite your function entirely for you, but here's a tip: assuming you are dealing with a manageable size of dataset, try reading your entire CSV into memory as a list of lists (or list of tuples), then perform your calculations on the values on this object, then write the python object back out to the new CSV in a separate block of code. You may find this article or this one of use. Naturally the official documentation should be helpful too.
Also, I would suggest using different files for input and output to make your life easier.
For example:
import csv
data = []
with open('Trial_test.csv', 'rb') as csvfile:
reader = csv.reader(csvfile, delimiter=' ', quotechar='|')
for row in reader:
data.append(row)
# now do your calculations on the 'data' object.
with open('Trial_test_new.csv', 'wb') as csvfile:
writer = csv.writer(csvfile, delimiter=' ', quotechar='|')
for row in data:
writer.writerow(row)
Something like that, anyway!

Categories