Using Pandas, I'm trying to extract value using the key but I keep failing to do so. Could you help me with this?
There's a csv file like below:
value
"{""id"":""1234"",""currency"":""USD""}"
"{""id"":""5678"",""currency"":""EUR""}"
I imported this file in Pandas and made a DataFrame out of it:
dataframe from a csv file
However, when I tried to extract the value using a key (e.g. df["id"]), I'm facing an error message.
I'd like to see a value 1234 or 5678 using df["id"]. Which step should I take to get it done? This may be a very basic question but I need your help. Thanks.
The csv file isn't being read in correctly.
You haven't set a delimiter; pandas can automatically detect a delimiter but hasn't done so in your case. See the read_csv documentation for more on this. Because the , the pandas dataframe has a single column, value, which has entire lines from your file as individual cells - the first entry is "{""id"":""1234"",""currency"":""USD""}". So, the file doesn't have a column id, and you can't select data by id.
The data aren't formatted as a pandas df, with row titles and columns of data. One option is to read in this data is to manually process each row, though there may be slicker options.
file = 'test.dat'
f = open(file,'r')
id_vals = []
currency = []
for line in f.readlines()[1:]:
## remove obfuscating characters
for c in '"{}\n':
line = line.replace(c,'')
line = line.split(',')
## extract values to two lists
id_vals.append(line[0][3:])
currency.append(line[1][9:])
You just need to clean up the CSV file a little and you are good. Here is every step:
# open your csv and read as a text string
with open('My_CSV.csv', 'r') as f:
my_csv_text = f.read()
# remove problematic strings
find_str = ['{', '}', '"', 'id:', 'currency:','value']
replace_str = ''
for i in find_str:
my_csv_text = re.sub(i, replace_str, my_csv_text)
# Create new csv file and save cleaned text
new_csv_path = './my_new_csv.csv' # or whatever path and name you want
with open(new_csv_path, 'w') as f:
f.write(my_csv_text)
# Create pandas dataframe
df = pd.read_csv('my_new_csv.csv', sep=',', names=['ID', 'Currency'])
print(df)
Output df:
ID Currency
0 1234 USD
1 5678 EUR
You need to extract each row of your dataframe using json.loads() or eval()
something like this:
import json
for row in df.iteritems():
print(json.loads(row.value)["id"])
# OR
print(eval(row.value)["id"])
So I have a csv file where I need to filter the rows based on the values that I have on a txt file. Is there an easy way to do this on pandas? The csv will have about 2000 rows and the txt file has about 400 data points. I need to generate a csv with rows that match the data on the txt file.
The CSV file looks like this:
Chromosome Gene Start End
1 PERM1 5 6
2 AGRN 7 10
3 MIB2 9 12
The Text file looks like
PERM1
NADK
GNB1
Thank you
First read Text file into a list or tuple:
lines = tuple(open(filename, 'r'))
Then filter the lines exist in the text file:
df = read_csv('csvfile')
result = df[df['Chromosome Gene'].isin(lines)]
easy enough using pandas read and filter functionality. I'm assuming you have an input .csv file called input_csv_file and a filter file called filter.csv. Input file has a column "filter_locatitons" and input_file has a column called "locations":
input_df = pd.read_csv('input_csv_file.csv')
filter_df = pd.read_csv('filter.csv')
filtered_df = input_df[[input_df['location'].isin(filter_df['filter_locations']]
This can be achieved using mask and loading both the files in dataframes. Below code assumes your test file does not have a header and your csv file is space delimited
import pandas as pd
df1 = pd.read_csv('csvfile.csv', delimiter=' ')
df2 = pd.read_csv('textfile.txt', header=None)
df2.columns = ['Gene']
m = df1.Gene.isin(df2.Gene)
df3 = df1[m]
print(df3)
I have a csv file, where the columns are separated by tab delimiter but the number of columns is not constant. I need to read the file up to the 5th column. (I dont want to ready the whole file and then extract the columns, I would like to read for example line by line and skip the remaining columns)
You can use usecols argument in pd.read_csv to limit the number of columns to be read.
# test data
s = '''a,b,c
1,2,3'''
with open('a.txt', 'w') as f:
print(s, file=f)
df1 = pd.read_csv("a.txt", usecols=range(1))
df2 = pd.read_csv("a.txt", usecols=range(2))
print(df1)
print()
print(df2)
# output
# a
#0 1
#
# a b
#0 1 2
You can use pandas nrows to read only a certain number of csv lines:
import pandas as pd
df = pd.read_csv('out122.txt', usecols=[0,1,2,3,4])
I have a .csv file having multiple rows and columns:
chain Resid Res K B C Tr Kw Bw Cw Tw
A 1 ASP 1 0 0.000104504 NA 0 0 0.100087974 0.573972285
A 2 GLU 2 627 0.000111832 0 0.033974309 0.004533331 0.107822844 0.441666022
Whenever I open the file using pandas or using with open, it shows that there are only column and multiple rows:
629 rows x 1 columns
Here is the code im using:
data= pd.read_csv("F:/file.csv", sep='\t')
print(data)
and the result I'm getting is this"
A,1,ASP,1,0,0.0001045041279130...
I want the output to be in a dataframe form so that I can carry out future calculations. Any help will be highly appreciated
There is separator ,, so is psosible omit parameter sep, because sep=',' is deafault separator in read_csv:
data= pd.read_csv("F:/file.csv")
you can read the csv using the following code snippet
import pandas as pd
data = pd.read_csv('F:/file.csv', sep=',')
Don't use '\t', because you don't have four consecutive spaces together (a tab between), so use the below:
data = pd.read_csv("F:/file.csv")
Or if really needed, use:
data = pd.read_csv("F:/file.csv", sep='\s{2,}', engine='python')
If your data values have spaces.
I have a .csv to read into a DataFrame and the names of the columns are in the same .csv file in the previos rows. Usually I drop all the 'unnecesary' rows to create the DataFrame and then hardcode the names of each dataframe
Trigger time,2017-07-31,10:45:38
CH,Signal name,Input,Range,Filter,Span
CH1, "Tin_MIX_Air",TEMP,PT,Off,2000.000000,-200.000000,degC
CH2, "Tout_Fan2b",TEMP,PT,Off,2000.000000,-200.000000,degC
CH3, "Tout_Fan2a",TEMP,PT,Off,2000.000000,-200.000000,degC
CH4, "Tout_Fan1a",TEMP,PT,Off,2000.000000,-200.000000,degC
Here you can see the rows where the columns names are in double quotes "TinMix","Tout..",etc there are exactly 16 rows with names
Logic/Pulse,Off
Data
Number,Date&Time,ms,CH1,CH2,CH3,CH4,CH5,CH7,CH8,CH9,CH10,CH11,CH12,CH13,CH14,CH15,CH16,CH20,Alarm1-10,Alarm11-20,AlarmOut
NO.,Time,ms,degC,degC,degC,degC,degC,degC,%RH,%RH,degC,degC,degC,degC,degC,Pa,Pa,A,A1234567890,A1234567890,A1234
1,2017-07-31 10:45:38,000,+25.6,+26.2,+26.1,+26.0,+26.3,+25.7,+43.70,+37.22,+25.6,+25.3,+25.1,+25.3,+25.3,+0.25,+0.15,+0.00,LLLLLLLLLL,LLLLLLLLLL,LLLL
And here the values of each variables start.
What I need to do is create a Dataframe from this .csv and place these names in the columns names. I'm new to Python and I'm not very sure how to do it
import pandas as pd
path = r'path-to-file.csv'
data=pd.DataFrame()
with open(path, 'r') as f:
for line in f:
data = pd.concat([data, pd.DataFrame([tuple(line.strip().split(','))])], ignore_index=True)
data.drop(data.index[range(0,29)],inplace=True)
x=len(data.iloc[0])
data.drop(data.columns[[0,1,2,x-1,x-2,x-3]],axis=1,inplace=True)
data.reset_index(drop=True,inplace=True)
data = data.T.reset_index(drop=True).T
data = data.apply(pd.to_numeric)
This is what I've done so far to get my dataframe with the usefull data, I'm dropping all the other columns that arent useful to me and keeping only the values. Last three lines are to reset row/column indexes and to transform the whole df to floats. What I would like is to name the columns with each of the names I showed in the first piece of coding as a I said before I'm doing this manually as:
data.columns = ['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p']
But I would like to get them from the .csv file since theres a possibility on changing the CH# - "Name" combination
Thank you very much for the help!
Comment: possible for it to work within the other "OPEN " loop that I have?
Assume Column Names from Row 2 up to 6, Data from Row 7 up to EOF.
For instance (untested code)
data = None
columns = []
with open (path) as fh:
for row, line in enumerate (fh, 1):
if row > 2 and row <= 6:
ch, name = line.split(',')[:2]
columns.append(name)
else:
row_data = [tuple(line.strip().split(','))]
if not data:
data = pd.DataFrame(row_data, columns=columns, ignore_index=True)
else:
data.append(row_data)
Question: ... I would like to get them from the .csv file
Start with:
with open (path) as fh:
for row, line in enumerate (fh, 1):
if row > 2:
ch, name = line.split(',')[:2]