Using Pandas, I'm trying to extract value using the key but I keep failing to do so. Could you help me with this?
There's a csv file like below:
value
"{""id"":""1234"",""currency"":""USD""}"
"{""id"":""5678"",""currency"":""EUR""}"
I imported this file in Pandas and made a DataFrame out of it:
dataframe from a csv file
However, when I tried to extract the value using a key (e.g. df["id"]), I'm facing an error message.
I'd like to see a value 1234 or 5678 using df["id"]. Which step should I take to get it done? This may be a very basic question but I need your help. Thanks.
The csv file isn't being read in correctly.
You haven't set a delimiter; pandas can automatically detect a delimiter but hasn't done so in your case. See the read_csv documentation for more on this. Because the , the pandas dataframe has a single column, value, which has entire lines from your file as individual cells - the first entry is "{""id"":""1234"",""currency"":""USD""}". So, the file doesn't have a column id, and you can't select data by id.
The data aren't formatted as a pandas df, with row titles and columns of data. One option is to read in this data is to manually process each row, though there may be slicker options.
file = 'test.dat'
f = open(file,'r')
id_vals = []
currency = []
for line in f.readlines()[1:]:
## remove obfuscating characters
for c in '"{}\n':
line = line.replace(c,'')
line = line.split(',')
## extract values to two lists
id_vals.append(line[0][3:])
currency.append(line[1][9:])
You just need to clean up the CSV file a little and you are good. Here is every step:
# open your csv and read as a text string
with open('My_CSV.csv', 'r') as f:
my_csv_text = f.read()
# remove problematic strings
find_str = ['{', '}', '"', 'id:', 'currency:','value']
replace_str = ''
for i in find_str:
my_csv_text = re.sub(i, replace_str, my_csv_text)
# Create new csv file and save cleaned text
new_csv_path = './my_new_csv.csv' # or whatever path and name you want
with open(new_csv_path, 'w') as f:
f.write(my_csv_text)
# Create pandas dataframe
df = pd.read_csv('my_new_csv.csv', sep=',', names=['ID', 'Currency'])
print(df)
Output df:
ID Currency
0 1234 USD
1 5678 EUR
You need to extract each row of your dataframe using json.loads() or eval()
something like this:
import json
for row in df.iteritems():
print(json.loads(row.value)["id"])
# OR
print(eval(row.value)["id"])
I am copying list output data from a DataCamp course so I can recreate the exercise in Visual Studio Code or Jupyter Notebook. From DataCamp Python Interactive window, I type the name of the list, highlight the output and paste it into a new file in VSCode. I use find and replace to delete all the commas and spaces and now have 142 numeric values, and I Save As life_exp.csv. Looks like this:
43.828
76.423
72.301
42.731
75.32
81.235
79.829
75.635
64.062
79.441
When I read the file into VSCode using either Pandas read_csv or csv.reader and use values.tolist() with Pandas or a for loop to append an existing, blank list, both cases provide me with a list of lists which then does not display the data correctly when I try to create matplotlib histograms.
I used NotePad to save the data as well as a .csv and both ways of saving the data produce the same issue.
import matplotlib.pyplot as plt
import csv
life_exp = []
with open ('C:\data\life_exp.csv', 'rt') as life_expcsv:
exp_read = csv.reader(life_expcsv, delimiter = '\n')
for row in exp_read:
life_exp.append(row)
And
import pandas as pd
life_exp_df = pd.read_csv('c:\\data\\life_exp.csv', header = None)
life_exp = life_exp_df.values.tolist()
When you print life_exp after importing using csv, you get:
[['43.828'],
['76.423'],
['72.301'],
['42.731'],
['75.32'],
['81.235'],
['79.829'],
['75.635'],
['64.062'],
['79.441'],
['56.728'],
….
And when you print life_exp after importing using pandas read_csv, you get the same thing, but at least now it's not a string:
[[43.828],
[76.423],
[72.301],
[42.731],
[75.32],
[81.235],
[79.829],
[75.635],
[64.062],
[79.441],
[56.728],
…
and when you call plt.hist(life_exp) on either version of the list, you get each value as bin of 1.
I just want to read each value in the csv file and put each value into a simple Python list.
I have spent days scouring stackoverflow thinking someone has done this, but I can't seem to find an answer. I am very new to Python, so your help greatly appreciated.
Try:
import pandas as pd
life_exp_df = pd.read_csv('c:\\data\\life_exp.csv', header = None)
# Select the values of your first column as a list
life_exp = life_exp_df.iloc[:, 0].tolist()
instead of:
life_exp = life_exp_df.values.tolist()
With csv reader, it will parse the line into a list using the delimiter you provide. In this case, you provide \n as the delimiter but it will still take that single item and return it as a list.
When you append each row, you are essentially appending that list to another list. The simplest work-around is to index into row to extract that value
with open ('C:\data\life_exp.csv', 'rt') as life_expcsv:
exp_read = csv.reader(life_expcsv, delimiter = '\n')
for row in exp_read:
life_exp.append(row[0])
However, if your data is not guaranteed to be formatted the way you have provided, you will need to handle that a bit differently:
with open ('C:\data\life_exp.csv', 'rt') as life_expcsv:
exp_read = csv.reader(life_expcsv, delimiter = '\n')
for row in exp_read:
for number in row:
life_exp.append(number)
A bit cleaner with list comprehension:
with open ('C:\data\life_exp.csv', 'rt') as life_expcsv:
exp_read = csv.reader(life_expcsv, delimiter = '\n')
[life_exp.append(number) for row in exp_read for number in row]
Question on how to merge different column values into out without comma on python...
My task as like this.
A big csv file data has following rows
s,0,6,8,9,2,-,3,6,2,8,7,1,0,n,.,c,s,v
s,0,5,9,6,0,-,3,6,7,0,1,6,0,n,.,c,s,v
s,1,9,0,5,5,-,3,6,1,5,5,8,6,n,.,c,s,v
s,2,8,0,7,9,-,3,2,5,1,8,2,7,n,.,c,s,v
s,0,0,5,6,5,-,3,3,4,0,5,7,0,n,.,c,s,v
s,3,0,3,4,8,-,3,5,9,1,2,2,6,n,.,c,s,v
s,0,3,8,8,9,-,3,7,3,1,0,2,5,n,.,c,s,v
I want to make this look like follow:
06892
05960
19055
28079
00565
30348
03889
I attempted following code without success.
import csv, os
with open ('/Desktop/case.csv','r') as h:
reader = csv.reader(h)
for row in reader:
k = row[1:6]
print(k)
When I did this, following results come up.
0,6,8,9,2
0,5,9,6,0
1,9,0,5,5
2,8,0,7,9
0,0,5,6,5
3,0,3,4,8
0,3,8,8,9
How to make this look like my desired output, i.e. without commas?
Use join:
from io import StringIO
import csv
txtfile = StringIO("""s,0,6,8,9,2,-,3,6,2,8,7,1,0,n,.,c,s,v
s,0,5,9,6,0,-,3,6,7,0,1,6,0,n,.,c,s,v
s,1,9,0,5,5,-,3,6,1,5,5,8,6,n,.,c,s,v
s,2,8,0,7,9,-,3,2,5,1,8,2,7,n,.,c,s,v
s,0,0,5,6,5,-,3,3,4,0,5,7,0,n,.,c,s,v
s,3,0,3,4,8,-,3,5,9,1,2,2,6,n,.,c,s,v
s,0,3,8,8,9,-,3,7,3,1,0,2,5,n,.,c,s,v""")
reader = csv.reader(txtfile)
for row in reader:
k = row[1:6]
print(''.join(k))
Output:
06892
05960
19055
28079
00565
30348
03889