Parse csv file that has subtables of unequal dimensions

Parse csv file that has subtables of unequal dimensions - python

I Have a csv file with the following sample data:
[Network]
Network Settings
RECORDNAME,DATA
UTDFVERSION,8
Metric,0
yellowTime,3.5
allRedTime,1.0
Walk,7.0
DontWalk,11.0
HV,0.02
PHF,0.92
[Nodes]
Node Data
INTID,TYPE,X,Y,Z,DESCRIPTION,CBD,Inside Radius,Outside Radius,Roundabout Lanes,Circle Speed
1,1,111152,12379,0,,,,,,
2,1,134346,12311,0,,,,,,
3,3,133315,12317,0,,,,,,
4,1,133284,13574,0,,,,,,
I need help figuring out how to place this into two separate tables using python. I have the following code so far, but I get a keyerror on "if row['RECORDNAME'] == 'Network Settings':" when I try to use it.
# Open the file
with open('filename.csv', 'r') as f:
# Create a reader
reader = csv.DictReader(f)
# Initialize empty lists for the tables
network_table = []
nodes_table = []
# Loop through the rows
for row in reader:
# Check if the row contains the "RECORDNAME" key
if 'RECORDNAME' in row:
# Check if the row belongs to the "Network" or "Nodes" section
if row['RECORDNAME'] == 'Network Settings':
# Add the row to the "Network" table
network_table.append(row)
elif row['INTID'] is not None:
# Add the row to the "Nodes" table
nodes_table.append(row)
# Print the tables
print(network_table)
print(nodes_table)
Any suggestions would be appreciated.

this is one way to solve this, read the file for one table, and then store that data. comment added to code, hope they helps.
# Gist - read the rows, and determine when the data starts for the [![enter image description here][1]][1]next category
import csv
import re
result={} # store in the list , each item as one type of data
with open('node.csv', 'r') as csvfile:
csvreader = csv.reader(csvfile, delimiter=',')
for e in csvreader:
if re.search('\[.*\]',''.join(e)): # config found pattern [....] example [Network]
table_data=True
key_dict=next(csvreader) # get the key, so know the following data below to that category
key_dict=''.join(key_dict)
key_data=[]
while table_data:
try:
detail_data=next(csvreader)
except:
table_data=False # handling the eof
if not ''.join(detail_data):
table_data=False
key_data.append(detail_data)
result[key_dict]=key_data # store the data
each table can be accessed like result['Network Settings'].

Related

Merge rows in a CSV to a column

I am new in python, I have one CSV file, it has more than 1000 rows, I want to merge particular rows and move those rows to another column, can any one help?
This is the source csv file I have:
I want to move emails under members column with comma separator, like this image:

To read csv files in Python, you can use the csv module. This code does the merging you're looking for.
import csv
output = [] # this will store a list of new rows
with open('test.csv') as f:
reader = csv.reader(f)
# read the first line of the input as the headers
header = next(reader)
output.append(header)
# we will build up groups and their emails
emails = []
group = []
for row in reader:
if len(row) > 1 and row[1]: # "UserGroup" is given
if group:
group[-1] = ','.join(emails)
group = row
output.append(group)
emails = []
else: # it isn't, assume this is an email
emails.append(row[0])
group[-1] = ','.join(emails)
# now write a new file
with open('new.csv', 'w') as f:
writer = csv.writer(f)
writer.writerows(output)

Updating a specific csv column based on randomname

My code pulls a random name from a csv file. When a button is pressed i want my code to search through the csv file, and update the cell next to the name generated previously in the code.
The variable in which the name is stored in is called name
The index which pulls the random name from the csv file is stored in the variable y
The function looks like this. I have asked this question previously however have had no luck in receiving answers, so i have made edits to the function and hopefully made it more clear.
namelist_file = open('StudentNames&Questions.csv')
reader = csv.reader(namelist_file)
writer = csv.writer(namelist_file)
rownum=0
array=[]
for row in reader:
if row == name:
writer.writerow([y], "hello")
Only the first two columns of the csv file are relevant
This is the function which pulls a random name from the csv file.
def NameGenerator():
namelist_file = open('StudentNames&Questions.csv')
reader = csv.reader(namelist_file)
rownum=0
array=[]
for row in reader:
if row[0] != '':
array.append(row[0])
rownum=rownum+1
length = len(array)-1
i = random.randint(1,length)
global name
name = array[i]
return name

There are a number of issues with your code:
You're trying to have both a reader object and a writer on the same file at the same time. Instead, you should read the file contents in, make any changes necessary and then write the whole file back out at the end.
You need to open the file in write mode in order to actually make changes to the contents. Currently, you don't specify what mode you're using so it defaults to read mode.
row is actually a list representing all data in the row. Therefore, it cannot be equal to the name you're searching, only the 0th index might be.
The following should work:
with open('StudentNames&Questions.csv', 'r') as infile:
reader = csv.reader(infile)
data = [row for row in reader]
for row in data:
if row[0] == name:
row[1] += 1
with open('StudentNames&Questions.csv', 'w', newline='') as outfile:
writer = csv.writer(outfile)
writer.writerows(data)

Creating a versatile data class in Python

I'm hoping you good folks can help with a project I'm working on. Essentially, I am trying to create a class that will take as an input a CSV file, examine the file for the number of columns of data, and store that data in key, value pairs in a dictionary. The code I have up to this point is below:
import csv
class DataStandard():
'''class to store and examine columnar data saved as a csv file'''
def __init__(self, file_name):
self.file_name = file_name
self.full_data_set = {}
with open(self.file_name) as f:
reader = csv.reader(f)
# get labels of each column in list format
self.col_labels = next(reader)
# find the number of columns of data in the file
self.number_of_cols = len(self.col_labels)
# initialize lists to store data using column label as key
for label in self.col_labels:
self.full_data_set[label] = []
The piece I am having a hard time with is once the dictionary (full_data_set) is created I'm not sure how to loop through the remainder of the CSV file and store the data in the respective values for each key (column). Everything I have tried until now hasn't worked because of how I have to loop through the csv.reader object.
I hope this question makes sense, but please feel free to ask any clarifying questions. Also, if you think of an approach that may work in a better more pythonic way I would appreciate the input. This is one of my first self-guided projects on class, so the subject is fairly new to me. Thanks in advance!

To read rows you can use for row in reader
data = []
with open('test.csv') as f:
reader = csv.reader(f)
headers = next(reader)
for row in reader:
d = dict(zip(headers, row))
#print(d)
data.append(d)
print('data:', data)
As said #PM2Ring csv has DictReader
with open('test.csv') as f:
reader = csv.DictReader(f)
data = list(reader)
print('data:', data)

This might give you ideas towards a solution. It is assumed that the labels are only on row 1, and the rest is data, and then the row length becomes 0 when there is no data:
import csv
class DataStandard():
'''class to store and examine columnar data saved as a csv file'''
def __init__(self, file_name):
self.file_name = file_name
self.full_data_set = {}
#modify method to the following:
with open(self.file_name) as f:
reader = csv.reader(f)
for row in reader:
if row = 0:
# get labels of each column in list format
self.col_labels = next(reader)
# find the number of columns of data in the file
self.number_of_cols = len(self.col_labels)
# initialize lists to store data using column label as key
for label in self.col_labels:
self.full_data_set[label] = []
else:
if len(row) != 0:
for i in range(self.number_of_cols):
label = self.col_labels[i]
self.full_data_set[label] = next(reader)
...My one concern is that while the 'with open(...)' is valid, some levels of indentation can be ignored, from my experience. In that case, to reduce the number of indentations, I would just separate 'row=0' and 'row!=0' operations into different instances of 'with open(...)' i.e. do row 1, close, open again, do row 2.

Trying to understand why my python script is skipping the second row in a csv file

I am putting together a python script to create an output of email addresses from a csv. For whatever reason, the script below is skipping the second row (row after the headers) and only recording the data I need after that row. For example, if the users in the csv file were:
Username:
test
admin
root
The output would only be:
Emails:
admin#gmail.com
root#gmail.com
Thus completely ignoring the first entry. Here is the code, any thoughts on the matter are greatly appreciated.
for filename in glob.glob(path):
with open(filename, 'r') as f:
reader = csv.DictReader(f)
initialExportOneList = []
for row in reader:
iE = [row['Computer Name'], row['Username']]
finalExportInOneList = [column['Username'] for column in reader if column['Username']]
initialExportOneList.append(iE)
domain = '#gmail.com'
for i in finalExportInOneList:
fullEmailCreation = i + domain
print(fullEmailCreation)

You iterate over reader inside the loop, to get finalExportInOneList.

for column in reader
I am sure you want
for column in row

How to ignore the first line of data when processing CSV data?

I am asking Python to print the minimum number from a column of CSV data, but the top row is the column number, and I don't want Python to take the top row into account. How can I make sure Python ignores the first line?
This is the code so far:
import csv
with open('all16.csv', 'rb') as inf:
incsv = csv.reader(inf)
column = 1
datatype = float
data = (datatype(column) for row in incsv)
least_value = min(data)
print least_value
Could you also explain what you are doing, not just give the code? I am very very new to Python and would like to make sure I understand everything.

You could use an instance of the csv module's Sniffer class to deduce the format of a CSV file and detect whether a header row is present along with the built-in next() function to skip over the first row only when necessary:
import csv
with open('all16.csv', 'r', newline='') as file:
has_header = csv.Sniffer().has_header(file.read(1024))
file.seek(0) # Rewind.
reader = csv.reader(file)
if has_header:
next(reader) # Skip header row.
column = 1
datatype = float
data = (datatype(row[column]) for row in reader)
least_value = min(data)
print(least_value)
Since datatype and column are hardcoded in your example, it would be slightly faster to process the row like this:
data = (float(row[1]) for row in reader)
Note: the code above is for Python 3.x. For Python 2.x use the following line to open the file instead of what is shown:
with open('all16.csv', 'rb') as file:

To skip the first line just call:
next(inf)
Files in Python are iterators over lines.

Borrowed from python cookbook,
A more concise template code might look like this:
import csv
with open('stocks.csv') as f:
f_csv = csv.reader(f)
headers = next(f_csv)
for row in f_csv:
# Process row ...

In a similar use case I had to skip annoying lines before the line with my actual column names. This solution worked nicely. Read the file first, then pass the list to csv.DictReader.
with open('all16.csv') as tmp:
# Skip first line (if any)
next(tmp, None)
# {line_num: row}
data = dict(enumerate(csv.DictReader(tmp)))

You would normally use next(incsv) which advances the iterator one row, so you skip the header. The other (say you wanted to skip 30 rows) would be:
from itertools import islice
for row in islice(incsv, 30, None):
# process

use csv.DictReader instead of csv.Reader.
If the fieldnames parameter is omitted, the values in the first row of the csvfile will be used as field names. you would then be able to access field values using row["1"] etc

Python 2.x
csvreader.next()
Return the next row of the reader’s iterable object as a list, parsed
according to the current dialect.
csv_data = csv.reader(open('sample.csv'))
csv_data.next() # skip first row
for row in csv_data:
print(row) # should print second row
Python 3.x
csvreader.__next__()
Return the next row of the reader’s iterable object as a list (if the
object was returned from reader()) or a dict (if it is a DictReader
instance), parsed according to the current dialect. Usually you should
call this as next(reader).
csv_data = csv.reader(open('sample.csv'))
csv_data.__next__() # skip first row
for row in csv_data:
print(row) # should print second row

The documentation for the Python 3 CSV module provides this example:
with open('example.csv', newline='') as csvfile:
dialect = csv.Sniffer().sniff(csvfile.read(1024))
csvfile.seek(0)
reader = csv.reader(csvfile, dialect)
# ... process CSV file contents here ...
The Sniffer will try to auto-detect many things about the CSV file. You need to explicitly call its has_header() method to determine whether the file has a header line. If it does, then skip the first row when iterating the CSV rows. You can do it like this:
if sniffer.has_header():
for header_row in reader:
break
for data_row in reader:
# do something with the row

this might be a very old question but with pandas we have a very easy solution
import pandas as pd
data=pd.read_csv('all16.csv',skiprows=1)
data['column'].min()
with skiprows=1 we can skip the first row then we can find the least value using data['column'].min()

The new 'pandas' package might be more relevant than 'csv'. The code below will read a CSV file, by default interpreting the first line as the column header and find the minimum across columns.
import pandas as pd
data = pd.read_csv('all16.csv')
data.min()

Because this is related to something I was doing, I'll share here.
What if we're not sure if there's a header and you also don't feel like importing sniffer and other things?
If your task is basic, such as printing or appending to a list or array, you could just use an if statement:
# Let's say there's 4 columns
with open('file.csv') as csvfile:
csvreader = csv.reader(csvfile)
# read first line
first_line = next(csvreader)
# My headers were just text. You can use any suitable conditional here
if len(first_line) == 4:
array.append(first_line)
# Now we'll just iterate over everything else as usual:
for row in csvreader:
array.append(row)

Well, my mini wrapper library would do the job as well.
>>> import pyexcel as pe
>>> data = pe.load('all16.csv', name_columns_by_row=0)
>>> min(data.column[1])
Meanwhile, if you know what header column index one is, for example "Column 1", you can do this instead:
>>> min(data.column["Column 1"])

For me the easiest way to go is to use range.
import csv
with open('files/filename.csv') as I:
reader = csv.reader(I)
fulllist = list(reader)
# Starting with data skipping header
for item in range(1, len(fulllist)):
# Print each row using "item" as the index value
print (fulllist[item])

I would convert csvreader to list, then pop the first element
import csv
with open(fileName, 'r') as csvfile:
csvreader = csv.reader(csvfile)
data = list(csvreader) # Convert to list
data.pop(0) # Removes the first row
for row in data:
print(row)

I would use tail to get rid of the unwanted first line:
tail -n +2 $INFIL | whatever_script.py

just add [1:]
example below:
data = pd.read_csv("/Users/xyz/Desktop/xyxData/xyz.csv", sep=',', header=None)**[1:]**
that works for me in iPython

Python 3.X
Handles UTF8 BOM + HEADER
It was quite frustrating that the csv module could not easily get the header, there is also a bug with the UTF-8 BOM (first char in file).
This works for me using only the csv module:
import csv
def read_csv(self, csv_path, delimiter):
with open(csv_path, newline='', encoding='utf-8') as f:
# https://bugs.python.org/issue7185
# Remove UTF8 BOM.
txt = f.read()[1:]
# Remove header line.
header = txt.splitlines()[:1]
lines = txt.splitlines()[1:]
# Convert to list.
csv_rows = list(csv.reader(lines, delimiter=delimiter))
for row in csv_rows:
value = row[INDEX_HERE]

Simple Solution is to use csv.DictReader()
import csv
def read_csv(file): with open(file, 'r') as file:
reader = csv.DictReader(file)
for row in reader:
print(row["column_name"]) # Replace the name of column header.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Parse csv file that has subtables of unequal dimensions - python

Related

Merge rows in a CSV to a column

Updating a specific csv column based on randomname

Creating a versatile data class in Python

Trying to understand why my python script is skipping the second row in a csv file

How to ignore the first line of data when processing CSV data?

Categories

Resources