Pandas csv dataframe to json array - python

I am reading a csv file and trying to convert the data into json array.But I am facing issues as "only size-1 arrays can be converted to Python scalars"
The csv file contents are
4.4.4.4
5.5.5.5
My code is below
import numpy as np
import pandas as pd
df1 = pd.read_csv('/Users/Documents/datasetfiles/test123.csv', header=None)
df1.head(5)
0
0 4.4.4.4
1 5.5.5.5
df_to_array = np.array(df1)
app_json = json.dumps(df_to_array,default=int)
I need output as
["4.4.4.4", "5.5.5.5", "3.3.3.3"]

As other answers mentioned, just use list: json.dumps(list(df[0]))
FYI, the data shape is your problem:
if you absolutely must use numpy, then transpose the array first:
json.dumps(list(df_to_array.transpose()[0]))

Given test.csv:
4.4.4.4
5.5.5.5
Doing:
import json
with open('test.csv') as f:
data = f.read().splitlines()
print(data)
print(json.dumps(data))
Output:
['4.4.4.4', '5.5.5.5']
["4.4.4.4", "5.5.5.5"]
You're overcomplicating things using pandas is this is all you want to do~

import json
import pandas as pd
df1 = pd.read_csv('/Users/Documents/datasetfiles/test123.csv', header=None)
df1.head(5)
0
0 4.4.4.4
1 5.5.5.5
df_to_array = list(df1[0])
app_json = json.dumps(df_to_array,default=int)
print(app_json)
["4.4.4.4", "5.5.5.5", "3.3.3.3"]

Related

How to read an array in a dataframe?

I have a tsv file containing an array which has been read using read_csv().
The dtype of the array is shown as dtype: object. How do I read it and access it as an array?
For example:
df=
id values
1 [0,1,0,3,5]
2 [0,0,2,3,4]
3 [1,1,0,2,3]
4 [2,4,0,3,5]
5 [3,5,0,3,5]
Currently I am unpacking it as below:
for index,row in df.iterrows():
string = row['col2']
string=string.replace('[',"")
string=string.replace(']',"")
v1,v2,v3,v4,v5=string.split(",")
v1=int(v1)
v2=int(v2)
v3=int(v3)
v4=int(v4)
v5=int(v5)
Is there any alternative to this?
I want to do this because I want to create another column in the dataframe taking the average of all the values.
Adding additional details:col2
My tsv file looks as below:
id values
1 [0,1,0,3,5]
2 [0,0,2,3,4]
3 [1,1,0,2,3]
4 [2,4,0,3,5]
5 [3,5,0,3,5]
I am reading the tsv file as follows:
df=pd.read_csv('tsv_file_name.tsv',sep='\t', header=0)
You can use json to simplify your parsing:
import json
df['col2'] = df.col2.apply(lambda t: json.loads(t))
edit: following your comment, getting the average is easy:
# using numpy
df['col2_mean'] df.col2.apply(lambda t: np.array(t).mean())
# by hand
df['col2_mean'] df.col2.apply(lambda t: sum(t)/len(t))
import csv
with open('myfile.tsv) as tsvfile:
line = csv.reader(tsvfile, delimiter='...')
...
OR
from pandas import DataFrame
df = DataFrame.from_csv("myfile.tsv", sep="...")

Save pretty table output as a pandas data frame

How to convert the output I get from a pretty table to pandas dataframe and save it as an excel file.
My code which gets the pretty table output
from prettytable import PrettyTable
prtab = PrettyTable()
prtab.field_names = ['Item_1', 'Item_2']
for item in Items_2:
prtab.add_row([item, difflib.get_close_matches(item, Items_1)])
print(prtab)
I'm trying to convert this to a pandas dataframe however I get an error saying DataFrame constructor not properly called! My code to convert this is shown below
AA = pd.DataFrame(prtab, columns = ['Item_1', 'Item_2']).reset_index()
I found this method recently.
pretty_table.get_csv_string()
this will convert it to a csv string where you could write to a csv file.
I use it like this:
tbl_as_csv = pretty_table.get_csv_string().replace('\r','')
text_file = open("output_path.csv", "w")
n = text_file.write(tbl_as_csv)
text_file.close()
Load the data into a DataFrame first, then export to PrettyTable and Excel:
import io
import difflib
import pandas as pd
import prettytable as pt
data = []
for item in Items_2:
data.append([item, difflib.get_close_matches(item, Items_1)])
df = pd.DataFrame(data, columns=['Item_1', 'Item_2'])
# Export to prettytable
# https://stackoverflow.com/a/18528589/190597 (Ofer)
# Use io.StringIO with Python3, use io.BytesIO with Python2
output = io.StringIO()
df.to_csv(output)
output.seek(0)
print(pt.from_csv(output))
# Export to Excel file
filename = '/tmp/output.xlsx'
writer = pd.ExcelWriter(filename)
df.to_excel(writer,'Sheet1')

Python numpy, skip columns & read csv file

I've got a CSV file with 20 columns & about 60000 rows.
I'd like to read fields 2 to 20 only. I've tried the below code but the browser(using ipython) freezes & it just goes n for ages
import numpy as np
from numpy import genfromtxt
myFile = 'sampleData.csv'
myData = genfromtxt(myFile, delimiter=',', usecols(2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19)
print myData
How could I tweak this to work better & actually produce output please?
import pandas as pd
myFile = 'sampleData.csv'
df = pd.DataFrame(pd.read_csv(myFile,skiprows=1)) // Skipping header
print df
This works like a charm

Unable to convert JSON file to CSV using Python

I was trying to convert the below JSON file into a csv file.
JSON file
[{
"SubmitID":1, "Worksheet":3, "UserID":65,
"Q1":"395",
"Q2":"2178",
"Q3":"2699",
"Q4":"1494"},{
"SubmitID":2, "Worksheet":3, "UserID":65,
"Q4":"1394"},{
"SubmitID":3, "Worksheet":4, "UserID":65,
"Q1":"1629",
"Q2":"1950",
"Q3":"0117",
"Q4":"1816",
"Empty":" "}]
However, my Python code below gives the error message "TypeError: Expected String or Unicode". May I know how should I modify my program to make it work?
import json
import pandas as pd
f2 = open('temp.json')
useful_input = json.load(f2)
df=pd.read_json(useful_input)
print(df)
df.to_csv('results.csv')
You just need to pass the address string to pd.read_json():
df=pd.read_json("temp.json")
You have not to use json module:
Try:
import pandas as pd
df=pd.read_json("temp.json")
print(df)
df.to_csv('results.csv')
import pandas as pd
df = pd.read_json('data.json')
df.to_csv('data.csv', index=False, columns=['title', 'subtitle', 'date', 'description'])
import pandas as pd
df = pd.read_csv("data.csv")
df = df[df.columns[:4]]
df.dropna(how='all')
df.to_json('data.json', orient='records')

pandas create data frame, floats are objects, how to convert?

I have a text file:
sample value1 value2
A 0.1212 0.2354
B 0.23493 1.3442
i import it:
with open('file.txt', 'r') as fo:
notes = next(fo)
headers,*raw_data = [row.strip('\r\n').split('\t') for row in fo] # get column headers and data
names = [row[0] for row in raw_data] # extract first row (variables)
data= np.array([row[1:] for row in raw_data],dtype=float) # get rid of first row
if i then convert it:
s = pd.DataFrame(data,index=names,columns=headers[1:])
the data is recognized as floats. I could get the sample names back as column by s=s.reset_index().
if i do
s = pd.DataFrame(raw_data,columns=headers)
the floats are objects and i cannot perform standard calculations.
How would you make the data frame ? Is it better to import the data as dict ?
BTW i am using python 3.3
You can parse your data file directly into data frame as follows:
df = pd.read_csv('file.txt', sep='\t', index_col='sample')
Which will give you:
value1 value2
sample
A 0.12120 0.2354
B 0.23493 1.3442
[2 rows x 2 columns]
Then, you can do your computations.
To parse such a file, one should use pandas read_csv function.
Below is a minimal example showing the use of read_csv with parameter delim_whitespace set to True
import pandas as pd
from StringIO import StringIO # Python2 or
from io import StringIO # Python3
data = \
"""sample value1 value2
A 0.1212 0.2354
B 0.23493 1.3442"""
# Creation of the dataframe
df = pd.read_csv(StringIO(data), delim_whitespace=True)

Categories