Selecting rows in csv file in with the variable number of columns

Selecting rows in csv file in with the variable number of columns - python

I have a csv file that i need to select certain rows. For me is easy remove the AGE and MEAN WEIGHT because these names are the same in any file.
ID,AGE,HEIGHT,MEAN WEIGHT,20-Nov-2002,05-Mar-2003,09-Apr-2003,23-Jul-2003
1,23,1.80,80,78,78,82,82
2,25,1.60,58,56,60,60,56
3,20,1.90,100,98,102,98,102
ID,HEIGHT,20-Nov-2002,05-Mar-2003,09-Apr-2003,23-Jul-2003
1,1.80,78,78,82,82
2,1.60,56,60,60,56
3,1.90,98,102,98,102
i have this code
import csv
out= open("C:/Users/Pedro/data.csv")
rdr= csv.reader(out)
result= open('C:/Users/Pedro/datanew.csv','w')
wtr= csv.writer ( result,delimiter=',',lineterminator='\n')
for row in rdr:
wtr.writerow( (row[0], row[2], row[4],row[5],row[6],row[7]) )
out.close()
result.close()
but my difficulty is select all columns that have dates. The number of columns of the dates may be variable. The solution could be to detect the character - in row[4]

I'm not 100 % sure what's you're asking, but here is a script that may do what you want, which is to reproduce the file with all of an unknown number of date columns, plus your columns 0 and 2 (ID & HEIGHT):
import csv
with open('data.csv') as infile: # Use 'with' to close files automatically
reader = csv.reader(infile)
headers = reader.next() # Read first line
# Figure out which columns have '-' in them (assume these are dates)
date_columns = [col for col, header in enumerate(headers) if '-' in header]
# Add our desired other columns
all_columns = [0, 2] + date_columns
with open('new.csv', 'w') as outfile:
writer = csv.writer(outfile, delimiter=',', lineterminator='\n')
# print headers
writer.writerow([headers[i] for i in all_columns])
# print data
for row in reader: # Read remaining data from our input CSV
writer.writerow([row[i] for i in all_columns])
Does that help?

Related

Merge rows in a CSV to a column

I am new in python, I have one CSV file, it has more than 1000 rows, I want to merge particular rows and move those rows to another column, can any one help?
This is the source csv file I have:
I want to move emails under members column with comma separator, like this image:

To read csv files in Python, you can use the csv module. This code does the merging you're looking for.
import csv
output = [] # this will store a list of new rows
with open('test.csv') as f:
reader = csv.reader(f)
# read the first line of the input as the headers
header = next(reader)
output.append(header)
# we will build up groups and their emails
emails = []
group = []
for row in reader:
if len(row) > 1 and row[1]: # "UserGroup" is given
if group:
group[-1] = ','.join(emails)
group = row
output.append(group)
emails = []
else: # it isn't, assume this is an email
emails.append(row[0])
group[-1] = ','.join(emails)
# now write a new file
with open('new.csv', 'w') as f:
writer = csv.writer(f)
writer.writerows(output)

Print first 5 rows of large csv file (not using pandas)

Im attempting to simplify a python code that will print the first five rows (plus header) of a large csv file in a more condensed output if possible. I would prefer to use pandas, however in this case I would like to just to just use the import cv and import os (Mac user).
Code as follows:
import csv
filename = "/Users/xx/Desktop/xx.csv"
fields = []
rows = []
with open(filename, 'r') as csvfile:
csvreader = csv.reader(csvfile)
fields = next(csvreader)
for row in csvreader:
rows.append(row)
print("Total no. of rows:%d"%(csvreader.line_num))
print('Field names are:' + ', '.join(field for field in fields))
print('\nFirst 5 rows are:\n')
for row in rows[:5]:
for col in row:
print("%10s"%col,end=" "),
print('\n')

csv file parsing and making it dict

i have a .csv file trying to make it in a dict. I tried pandas and csv.DictReader mostly but until now i can print the data (not in the way i want) with the DictReader.
So the main problem is that the file is like
header;data (1 column)
for about 50 rows and after that it changes the schema like
header1;header2;header3;header4
in row 50 and row 50+
data1;data2;data3;data4 etc..
with open(filename, 'r', encoding='utf-16') as f:
for line in csv.DictReader(f):
print(line)
thats the code i have for now.
Thanks for your help.

You can't use DictReader for this, because it requires all the rows to have the same fields.
Use csv.reader and check the length of the row that it returns. When the length changes, treat that as a new header.
Hopefully you don't have adjacent sections of the file that have the same number of fields but different headers. It will be difficult for the script to detect when the section changes.
data = []
with open(filename, 'r', encoding='utf-16') as f:
r = csv.reader(f, delimiter=';')
# process first 52 rows in format header;data
for _ in range(52):
row = next(r)
data.append({row[0]: row[1]})
# rest of file is a header row followed by variable number of data rows
header = next(r)
for row in r:
if len(row) != len(header): # new header
header = row
continue
d = dict(zip(header, row))
data.append(d)

Rearranging data - row into multiple columns

So I have csv file with over 1m records:(https://i.imgur.com/rhIhy5u.png)
I need data to be arranged differently that "params" who repeats become column/row themselves for example category1, category2, category3 (there is over 20 categories and no repeats) but all the data maintain their relations.
I tried using "pandas" and "csv" in python but i am completly new to it and i never had anything to do with such a data.
import csv
with open('./data.csv', 'r') as _filehandler:
csv_file_reader = csv.reader(_filehandler)
param = [];
csv_file_reader = csv.DictReader(_filehandler)
for row in csv_file_reader:
if not row['Param'] in param:
param.append(row['Param']);
col = "";
for p in param:
col += str(p) + '; ';
print(col);
import numpy as np
np.savetxt('./SortedWexdord.csv', (parameters), delimiter=';', fmt='%s')
I tried to think about it but data is nor my forte, any ideas?

Here's something that should work. If you need more than one value per row normalized like this, you could edit line 9 (beginning category) to grab a list of values instead of just row[1].
import csv
data = {}
with open('data.csv', 'r') as file:
reader = csv.reader(file)
next(reader) # Skip header row
for row in reader:
category, value = row[0], row[1] # Assumes category is in column 0 and target value is in column 1
if category in data:
data[category].append(value)
else:
data[category] = [value] # New entry only for each unique category
with open('output.csv', 'wb') as file: # wb is write and binary, avoids double newlines on windows
writer = csv.writer(file)
writer.writerow(['Category', 'Value'])
for category in data:
print([category] + data[category])
writer.writerow([category] + data[category]) # Make a list starting with category and then listing each value

Read and Compare 2 CSV files on a row and column basis

I have two CSV files. data.csv and data2.csv.
I would like to first of Strip the two data files down to the data I am interested in. I have figured this part out with data.csv. I would then like to compare by row making sure that if a row is missing to add it.
Next I want to look at column 2. If there is a value there then I want to write to column 3 if there is data in column 3 then write to 4, etc.
My current program looks like sow. Need some guidance
Oh and I am using Python V3.4
__author__ = 'krisarmstrong'
#!/usr/bin/python
import csv
searched = ['aircheck', 'linkrunner at', 'onetouch at']
def find_group(row):
"""Return the group index of a row
0 if the row contains searched[0]
1 if the row contains searched[1]
etc
-1 if not found
"""
for col in row:
col = col.lower()
for j, s in enumerate(searched):
if s in col:
return j
return -1
inFile = open('data.csv')
reader = csv.reader(inFile)
inFile2 = open('data2.csv')
reader2 = csv.reader(inFile2)
outFile = open('data3.csv', "w")
writer = csv.writer(outFile, delimiter=',', quotechar='"', quoting=csv.QUOTE_ALL)
header = next(reader)
header2 = next(reader2)
"""Built a list of items to sort. If row 12 contains 'LinkRunner AT' (group 1),
one stores a triple (1, 12, row)
When the triples are sorted later, all rows in group 0 will come first, then
all rows in group 1, etc.
"""
stored = []
writer.writerow([header[0], header[3]])
for i, row in enumerate(reader):
g = find_group(row)
if g >= 0:
stored.append((g, i, row))
stored.sort()
for g, i, row in stored:
writer.writerow([row[0], row[3]])
inFile.close()
outFile.close()

Perhaps try:
import csv
with open('some.csv', 'rb') as f:
reader = csv.reader(f)
for row in reader:
col1.append(row[0])
col2.append(row[1])
for i in xrange(len(col1))
if col1[i] == '':
#thing to do if there is nothing for col1
if col2[i] == '':
#thing to do if there is nothing for col2
This is a start at "making sure that if a row is missing to add it".

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Selecting rows in csv file in with the variable number of columns - python

Related

Merge rows in a CSV to a column

Print first 5 rows of large csv file (not using pandas)

csv file parsing and making it dict

Rearranging data - row into multiple columns

Read and Compare 2 CSV files on a row and column basis

Categories

Resources