import csv
def csv_to_kvs(fileName):
stringFloats = []
with open(fileName,'r') as csvFile:
csvreader = csv.reader(csvFile)
for row in csvreader:
stringFloats.append(row)
print(stringFloats)
I am trying to take a CSV file that is in string,float,float,float format of 10 rows and I have to make the string become a key-value pair with all the floats on the corresponding row.
So if the CSV file is:
age,16,17,18
area,1,7,4
call,2,3,6
The code needs to return {age:[16,17,18],etc...}. Any steps in the right direction are appreciated. I am learning CSV file reading and don't understand it too well.
When you read the row, you have the dictionary key in column 0 and the values in the remaining columns. You can slice the row, optionally converting to float on the way, and assign to the needed dict.
import csv
def csv_to_kvs(fileName):
stringFloats = {}
with open(fileName,'r') as csvFile:
csvreader = csv.reader(csvFile)
for row in csvreader:
# assuming 1 and following should be floats
stringFloats[row[0]] = [float(val) for val in row[1:]]
print(stringFloats)
return stringFloats
(...and come to terms with 4 space indentation!)
Related
I have a tab-delimited csv file.
My csv file:
0.227996254681648 0.337028824833703 0.238163571416268 0.183009231781289 0.085746697332588 0.13412895376826
0.247891283973758 0.335555555555556 0.272129379268419 0.187328622765857 0.085921240923626 0.128372465534807
0.264761012183693 0.337777777777778 0.245917821271498 0.183211905363232 0.080493183753814 0.122786059549795
0.30506091846298 0.337777777777778 0.204265153911403 0.208453197418743 0.0715575291087 0.083682658454807
0.222748815165877 0.337028824833703 0.209714942778068 0.084252659537679 0.142013573559938 0.234672985858848
Now I would like to input each line from the csv file, do something with each element of each row and then do the same thing for the next line and so on.
My code:
lines = []
with open("/path/testfile.csv") as f:
csvReader = csv.reader( f, delimiter="\t" )
for row in csvReader:
x=row[0] #access first floating number of each line from csv
y=row[1] #access second floating number of each line from csv
z=row[2] #access third floating number of each line from csv
r=row[3] #access fourth floating number of each line from csv
s=row[4] #access fifth floating number of each line from csv
t=row[5] #access six floating number of each line from csv
#do something else with each element
Here I only included print(row[0]) into the for loop:
lines = []
with open("/path/testfile.csv") as f:
csvReader = csv.reader( f, delimiter="\t" )
for row in csvReader:
print(row[0])
But when already trying only print(row[0]), it already prints out all values from the csv file. How can I access each element from each row in python?
Not sure if you are familiar with the pandas library. You could use pandas which will simplify things a lot.
Code
import pandas as pd
df = pd.read_csv('./data/data.csv', delimiter='\t', header=None)
print(df)
Output
0 1 2 3 4 5
0 0.227996 0.337029 0.238164 0.183009 0.085747 0.134129
1 0.247891 0.335556 0.272129 0.187329 0.085921 0.128372
2 0.264761 0.337778 0.245918 0.183212 0.080493 0.122786
3 0.305061 0.337778 0.204265 0.208453 0.071558 0.083683
4 0.222749 0.337029 0.209715 0.084253 0.142014 0.234673
You can then you can any operation you want on any column. Example :
df[0] = df[0]*10 # Multiply all numbers in the 0th column by 10
just add another loop:
lines = []
with open("/path/testfile.csv") as f:
csvReader = csv.reader( f, delimiter=" " )
for row in csvReader:
for element in row:
do_something_with(element)
I recommend taking a look at the pandas library, it's a good starting point.
User Guide. It will allow you to easy handle and process your data easier.
I want to read a CSV file generated by my other script and I need to check 2 columns at same time. The problem is that my loop its stopping because there are empty values for some lines and It cant reach the following value. For example:
HASH 1111
HASH 2222
HASH 3333
HASH 4444
HASH 5555
HASH
HASH
HASH 6666
I cant read further point 5, because 6 and 7 has empty values and I need to read also the 8. Here is my code.
import csv
with open('vts.csv') as csvDataFile:
csvReader = csv.reader(csvDataFile, delimiter=';')
next(csvReader)
VTs = []
for row in csvReader:
VT = row
VTs.append(VT)
for row in VTs:
print(row[0],row[4])
Is there any way to continue the listing without manually sorting the Excel?
First, a csv file is not an Excel file. The former is a text delimited file, the latter is a binary one.
Next, your problem is not at reading time: the csv module can easily accept files with variable number of fields across its rows, including empty lines that will just give empty lists for row.
So the fix is just:
...
for row in VTs:
if len(row) > 4:
print(row[0],row[4])
There is no problem with your code except for the print(row[0],row[4]) for the given data while there no so many columns. I tested your code as follows:
.py
import csv
with open('vts.csv') as csvDataFile:
csvReader = csv.reader(csvDataFile, delimiter=';')
next(csvReader)
VTs = []
for row in csvReader:
VT = row
VTs.append(VT)
for row in VTs:
print(row[0], row[1])
vts.csv
HASH;1111
HASH;2222
HASH;3333
HASH;4444
HASH;5555
HASH;
HASH;
HASH;6666
If your data is as the sample, you don't really need delimiter=';' since it's a comma-separated value (hence csv), not semicolon ;.
Anyway, you can just ignore if the intended column not exists. Assuming your input is in proper csv format as below.
col1,col2
hash1,1111
hash2,2222
...
You can use csv.reader as what you did.
import csv
with open('vts.csv') as csvDataFile:
csvReader = csv.reader(csvDataFile, delimiter=';')
next(csvReader)
# csv.reader returns generator object, which you can convert it to list as below
VTs = list(csvReader)
for row in VTs:
if len(row) == 2:
print(row[0],row[1])
If your goal is only for inspecting the data, you can conveniently use pandas.DataFrame:
import pandas as pd
df = pd.read_csv("vts.csv")
print(df.dropna()) # This will print all rows without any missing data
list3 = []
with open('**directory**') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
list3.append(row)
I'm completely new to data analysis using Python, and require some assistance.
The file I'm accessing contains data from 5 people (CSV file). There are 3 columns - participant number, pre-task Score, and post-task Score.
I'm essentially trying to access this file (using csv.DictReader) and manipulate the data. By this, I mean I want to calculate the difference between the post-task score and pre-task score, for each participant, and print this to the screen.
However, I'm not sure how to do this. I can print each row to the screen, and I can save each row in a list (as I've done above) - but I'm clueless as to how I am to manipulate/deal with this data. I'm wondering if there is something better than the module I'm currently using.
Calculating the difference between the second and third columns in a CSV file can be accomplished as follows:
import csv
with open('file.csv', 'rb') as csvfile:
reader = csv.reader(csvfile, delimiter=',')
# skip the header row, remove this next line if there is no header
next(reader, None)
for row in reader:
difference = float(row[2]) - float(row[1])
print str(difference)
I have a CSV file with 100 rows.
How do I read specific rows?
I want to read say the 9th line or the 23rd line etc?
You could use a list comprehension to filter the file like so:
with open('file.csv') as fd:
reader=csv.reader(fd)
interestingrows=[row for idx, row in enumerate(reader) if idx in (28,62)]
# now interestingrows contains the 28th and the 62th row after the header
Use list to grab all the rows at once as a list. Then access your target rows by their index/offset in the list. For example:
#!/usr/bin/env python
import csv
with open('source.csv') as csv_file:
csv_reader = csv.reader(csv_file)
rows = list(csv_reader)
print(rows[8])
print(rows[22])
You simply skip the necessary number of rows:
with open("test.csv", "rb") as infile:
r = csv.reader(infile)
for i in range(8): # count from 0 to 7
next(r) # and discard the rows
row = next(r) # "row" contains row number 9 now
You could read all of them and then use normal lists to find them.
with open('bigfile.csv','rb') as longishfile:
reader=csv.reader(longishfile)
rows=[r for r in reader]
print row[9]
print row[88]
If you have a massive file, this can kill your memory but if the file's got less than 10,000 lines you shouldn't run into any big slowdowns.
You can do something like this :
with open('raw_data.csv') as csvfile:
readCSV = list(csv.reader(csvfile, delimiter=','))
row_you_want = readCSV[index_of_row_you_want]
May be this could help you , using pandas you can easily do it with loc
'''
Reading 3rd record using pandas -> (loc)
Note : Index start from 0
If want to read second record then 3-1 -> 2
loc[2]` -> read second row and `:` -> entire row details
'''
import pandas as pd
df = pd.read_csv('employee_details.csv')
df.loc[[2],:]
Output :
I have the csv file as follows:
product_name, product_id, category_id
book, , 3
shoe, 3, 1
lemon, 2, 4
I would like to update product_id of each row by providing the column name using python's csv library.
So for an example if I pass:
update_data = {"product_id": [1,2,3]}
then the csv file should be:
product_name, product_id, category_id
book, 1, 3
shoe, 2, 1
lemon, 3, 4
You can use your existing dict and iter to take items in order, eg:
import csv
update_data = {"product_id": [1,2,3]}
# Convert the values of your dict to be directly iterable so we can `next` them
to_update = {k: iter(v) for k, v in update_data.items()}
with open('input.csv', 'rb') as fin, open('output.csv', 'wb') as fout:
# create in/out csv readers, skip intial space so it matches the update dict
# and write the header out
csvin = csv.DictReader(fin, skipinitialspace=True)
csvout = csv.DictWriter(fout, csvin.fieldnames)
csvout.writeheader()
for row in csvin:
# Update rows - if we have something left and it's in the update dictionary,
# use that value, otherwise we use the value that's already in the column.
row.update({k: next(to_update[k], row[k]) for k in row if k in to_update})
csvout.writerow(row)
Now - this assumes that each new column value goes to the row number and that the existing values should be used after that. You could change that logic to only use new values when the existing value is blank for instance (or whatever other criteria you wish).
(assuming you're using 3.x)
Python has a CSV module in the standard library which helps read and amend CSV files.
Using that I'd find the index for the column you are after and store it in the dictionary you've made. Once that has been found it's simply a matter of popping the list item into each row.
import csv
update_data = {"product_id": [None, [1,2,3]]}
#I've nested the original list inside another so that we can hold the column index in the first position.
line_no = 0
#simple counter for the first step.
new_csv = []
#Holds the new rows for when we rewrite the file.
with open('test.csv', 'r') as csvfile:
filereader = csv.reader(csvfile)
for line in filereader:
if line_no == 0:
for key in update_data:
update_data[key][0] = line.index(key)
#This finds us the columns index and stores it for us.
else:
for key in update_data:
line[update_data[key][0]] = update_data[key][1].pop(0)
#using the column index we enter the new data into the correct place whilst removing it from the input list.
new_csv.append(line)
line_no +=1
with open('test.csv', 'w') as csvfile:
filewriter = csv.writer(csvfile)
for line in new_csv:
filewriter.writerow(line)