Here is an image of my CSV file:
import csv
f = open("datatest.csv")
reader = csv.reader(f)
dataListed = [row for row in reader]
rc = csv.writer(f)
column1 = []
for row in dataListed:
column1.append(row[0])
column2 = []
for row in dataListed:
column2.append(row[1])
divide = []
for row in dataListed:
divide = row[1] / row[2]
print(divide)
Why does the "divide" list not work? Everything else works as it should, I always just get an error for that that says something about strings, when I try to change the row[1 and 2] as a float, it breaks too! Help is greatly appreciated,
I am a pure beginner. Thanks
There are a lot issues in your code.
Firstly, your dataListed looks like this
[['lis1', 'lis2'], ['1', '2'], ['2', '7'], ['3', '9'], ['10', '10']]
You are trying to divide 2 string items like so.
divide = 'lis1'/'lis2' - TypeError: unsupported operand type(s) for /: 'str' and 'str'
so you need to remove 1st elemnt from list.
Secondly,
divide = row[1] / row[2]
your list has only 2 elemnts list index starts with 0 so it should be
divide = row[0] / row[1]
complete code after code correction
import csv
f = open(r"Tomas.csv")
reader = csv.reader(f)
dataListed = [row for row in reader]
rc = csv.writer(f)
column1 = []
for row in dataListed:
column1.append(row[0])
column2 = []
for row in dataListed:
column2.append(row[1])
dataListed.pop(0)
divide = []
for row in dataListed:
re = int(row[1]) / int(row[0])
divide.append(re)
print(divide)
Gives #
[2.0, 3.5, 3.0, 1.0]
have you considered using other libraries Thomas?
using pandas is very very easy - pandas
say your csv looks like this
lis1 lis2
0 1 2
1 2 7
2 3 9
3 10 10
Then
import pandas as pd
df = pd.read_csv(r"Thomas.csv")
df['new_list_after_Divison'] = (df['lis2']/df['lis1'])
print(df)
Gives #
lis1 lis2 new_list_after_Divison
0 1 2 2.0
1 2 7 3.5
2 3 9 3.0
3 10 10 1.0
Related
I have a .txt file of this sort
12
21
23
1
23
42
12
0
In which <12,21,23> are features and <1> is a label.
Again <23,42,12> are features and <0> is the label and so on.
I want to create a pandas dataframe from the above text file which contains only a single column into multiple column.
The format of the dataframe is {column1,column2,column3,column4}. And there are no column names in it.
Can someone please help me out in this?
Thanks
import pandas as pd
df = dict()
features = list()
label = ''
filename = '.txt'
with open(filename) as fd:
i = 0
for line in fd:
if i != 3:
features.append(line.strip())
i += 1
else:
label = line.strip()
i = 0
df[label] = features
features = list()
df = pd.DataFrame(df)
df
import pandas as pd
with open(<FILEPATH>, "r") as f:
lines = f.readlines()
formatted = [int(line[:-1]) for line in lines] # Remove \n and convert to int
labels = formatted[3::4]
features = list(zip(formatted[::4], formatted[1::4], formatted[2::4])) # You can modify this if there are more than three rows
data = {}
for i, label in enumerate(labels):
data[label] = list(features[i])
df = pd.DataFrame(data)
Comment if you have any questions or found any errors, and I will make ammendments.
You can use numpy first, you need to ensure that the number of values is a multiple of 4
each record as column with the label as header
a = np.loadtxt('file.txt').reshape((4,-1), order='F')
df = pd.DataFrame(a[:-1], columns=a[-1])
Output:
1.0 0.0
0 12.0 23.0
1 21.0 42.0
2 23.0 12.0
each record as a new row
a = np.loadtxt('file.txt').reshape((-1,4))
df = pd.DataFrame(a)
Output:
0 1 2 3
0 12.0 21.0 23.0 1.0
1 23.0 42.0 12.0 0.0
row = []
i = 0
data = []
with open('a.txt') as f:
for line in f:
data
i+= 1
row.append(int(line.strip()))
if i%4==0 and i!=0:
print(i)
data_rows_count +=1
data.append(row)
row = []
f.close()
df = pd.DataFrame(data)
results in df to be:
0 1 2 3
0 12 21 23 1
1 23 42 12 0
What is the best way to take this string:
1
2
3
4
a
b
c
d
1
2
3
4
a
b
c
d
1
2
3
4
a
b
c
d
and transform to a CSV containing 6 columns?
Desired output
Is a CSV which will be imported into Pandas:
1,a,1,a,1,a
2,b,2,b,2,b
etc..
Updated desired output as per comments to 6 rows.
Updated. I can get the first row like this if I assign the string to l variable:
l.split()[0::4]
['1', 'a', '1', 'a', '1', 'a']
with open('data.txt', 'r') as f:
data = f.read().split("\n")
for i in range(4):
d = list()
for j in range(i, len(data), 4):
d.append(data[j])
with open('data.csv', 'a') as csv:
csv.write(','.join(d)+"\n")
Even though Art's answer is accepted, here is another way using pandas. You wouldn't need to export the data prior to importing with pandas if you use something like this.
import pandas as pd
myFile="lines_to_read2.txt"
myData = pd.DataFrame (columns=['col1', 'col2', 'col3','col4'])
mycolumns = 4
thisItem = list()
with open(myFile, 'r') as linesToRead:
for thisLine in linesToRead:
thisItem.append(thisLine.strip('\n, " "'))
if len(thisItem) == mycolumns:
myData = myData.append({'col1':thisItem[0],'col2':thisItem[1],'col3':thisItem[2],'col4':thisItem[3]}, ignore_index=True)
thisItem = list()
myData.to_csv('lines_as_csv_file.csv', index=False)
print(myData) # Full Table
I am trying to import data from a file and then add it to an array. I know that this is not the best way to add elements to a numpy array. Nevertheless, why is the data not appending? The last element of the csv is 1.1 and thats what i get when i do print(dd)
with open('C:\\Users\jez40\.PyCharmCE2018.2\8_Data.csv', 'r') as data_file:
data = csv.reader(data_file, delimiter=',')
for i in data:
t = []
d = []
dd = []
t.append([float(i[0])])
d.append([float(i[1])])
dd.append([float(i[2])])
t = np.array(t)
d = np.array(d)
dd = np.array(dd)
print (dd)
The root of your problem lies in the fact that every iteration of your loop you are re-assigning t, d and dd to empty lists []. If your end-all goal is to acquire numpy arrays for these variables, I would recommend using pd.read_csv() to convert your csv file to a dataframe. Take this sample csv:
t,d,dd
1,2,3
4,5,6
7,8,9
Using pd.read_csv():
df = pd.read_csv(r'C:\\Users\jez40\.PyCharmCE2018.2\8_Data.csv')
Gives:
t d dd
0 1 2 3
1 4 5 6
2 7 8 9
Then you can query your columns to return them as pd.Series():
t = df['t']
d = df['d']
dd = df['dd']
Or you can convert them to np.array():
t = np.array(df['t'])
d = np.array(df['d'])
dd = np.array(df['dd'])
Is there an efficient way to store each column of a tab-delimited file in a separate dictionary using python?
A sample input file: (Real input file contains thousands of lines and hundreds of columns. Number of columns is not fixed, it changes frequently.)
A B C
1 4 7
2 5 8
3 6 9
I need to print values in column A:
for cell in mydict["A"]:
print cell
and to print values in the same row:
for i in range(1, numrows):
for key in keysOfMydict:
print mydict[key][i]
The simplest way is to use DictReader from the csv module:
with open('somefile.txt', 'r') as f:
reader = csv.DictReader(f, delimiter='\t')
rows = list(reader) # If your file is not large, you can
# consume it entirely
# If your file is large, you might want to
# step over each row:
#for row in reader:
# print(row['A'])
for row in rows:
print(row['A'])
#Marius made a good point - that you might be looking to collect all columns separately by their header.
If that's the case, you'll have to adjust your reading logic a bit:
from collections import defaultdict
by_column = defaultdict(list)
for row in rows:
for k,v in row.iteritems():
by_column[k].append(v)
Another option is pandas:
>>> import pandas as pd
>>> i = pd.read_csv('foo.csv', sep=' ')
>>> i
A B C
0 1 4 7
1 2 5 8
2 3 6 9
>>> i['A']
0 1
1 2
2 3
Name: A, dtype: int64
Not sure this is relevant, but you can do this using rpy2.
from rpy2 import robjects
dframe = robjects.DataFrame.from_csvfile('/your/csv/file.csv', sep=' ')
d = dict([(k, list(v)) for k, v in dframe.items()])
output:
{'A': [1, 2, 3], 'C': [7, 8, 9], 'B': [4, 5, 6]}
I'm new in Python and I have a problem of removing unwanted rows in a csv file. For instance I have 3 columns and a lot of rows:
A B C
hi 1.0 5
hello 2.0 6
ok 3.0 7
I loaded the data using numpy (instead of csv)
import numpy as np
a= np.loadtxt('data.csv' , delimiter= ',' , skiprows= 1)
I want to introduce a range for the 2nd column
b=np.arange(0, 2.1,0.1)
I don't have any idea how I should use that piece of code.
What I want as a final output is the following:
A B C
hi 1.0 5
hello 2.0 6
The last row would be remove since I chose a range for the 2nd column up to 2.0 only. I don't have any idea how can I accomplish this.
Try with Pandas:
import pandas as pd
a = pd.read_csv('data.csv', index_col=0) # column A will be the index.
a
B C
A
hi 1 5
hello 2 6
ok 3 7
For every value of B up to 2 :
a[a.B <= 2]
B C
A
hi 1 5
hello 2 6
Details :
a.B
A
hi 1
hello 2
ok 3
Name: B, dtype: float64
a.B <= 2
A
hi True
hello True
ok False
Name: B, dtype: bool
You can do it using logical indexing
index = (x[:, 1] <= 2.0)
Then
x = x[index]
selecting only the lines that satisfy this condition
You can just use the csv module. N.B the following expects that the CSV fields are comma separated, not tab separated (as your sample suggests).
import csv
with open('data.csv') as data:
reader = csv.reader(data) # or csv.reader(data, delimiter='\t') for tabs
field_names = next(reader)
filtered_rows = [row for row in reader if 0 <= float(row[1]) <= 2.0]
>>> field_names
['A', 'B', 'C']
>>> filtered_rows
[['hi', '1.0', '5'], ['hello', '2.0', '6']]
>>> filtered_rows.insert(0, field_names)
>>> filtered_rows
[['A', 'B', 'C'], ['hi', '1.0', '5'], ['hello', '2.0', '6']]
If you require that values be exact tenths within the required range, then you can do this:
import csv
import numpy as np
allowed_values = np.arange(0, 2.1, 0.1)
with open('data.csv') as data:
reader = csv.reader(data)
field_names = next(reader)
filtered_rows = [row for row in reader if float(row[1]) in allowed_values]
Edit after updated requirements
With extra constraints on column "C", e.g. value must be >= 6.
import csv
import numpy as np
allowed_values_B = np.arange(0, 2.1, 0.1)
def accept_row(row):
return (float(row[1]) in allowed_values_B) and (int(row[2]) >= 6)
with open('data.csv') as data:
reader = csv.reader(data)
field_names = next(reader)
filtered_rows = [row for row in reader if accept_row(row)]
>>> filtered_rows
[['hello', '2.0', '6']]