Formatting of JSON file - python

Can we convert the highlighted INTEGER values to STRING value (refer below link)?
https://i.stack.imgur.com/3JbLQ.png
CODE
filename = "newsample2.csv"
jsonFileName = "myjson2.json"
import pandas as pd
df = pd.read_csv ('newsample2.csv')
df.to_json('myjson2.json', indent=4)
print(df)

Try doing something like this.
import pandas as pd
filename = "newsample2.csv"
jsonFileName = "myjson2.json"
df = pd.read_csv ('newsample2.csv')
df['index'] = df.index
df.to_json('myjson2.json', indent=4)
print(df)
This will take indices of your data and store them in the index column, so they will become a part of your data.

Related

Pandas dataframe throwing error when appending to CSV

`
import pandas as pd
df = pd.read_csv("stack.csv")
sector_select = "Col2"
df[sector_select] = ["100"]
df.to_csv("stack.csv", index=False, mode='a', header=False)
`
stack.csv has no data other than a header: Col1,Col2,Col3,Col4,Col5
ValueError: Length of values (1) does not match length of index (2)
Im just trying to make a program where I can select a header and append data to the column under that header
You can only run it twice until it gives an error!
You can use this:
df = df.append({"Col2": 100}, ignore_index=True)
That code runs for me.
But I assume that you would like to run something like this:
import pandas as pd
df = pd.read_csv("stack.csv")
sector_select = "Col2"
df.at[len(df), sector_select] = "100"
df.to_csv("stack.csv", index=False)

Handle variable as file with pandas dataframe

I would like to create a pandas dataframe out of a list variable.
With pd.DataFrame() I am not able to declare delimiter which leads to just one column per list entry.
If I use pd.read_csv() instead, I of course receive the following error
ValueError: Invalid file path or buffer object type: <class 'list'>
If there a way to use pd.read_csv() with my list and not first save the list to a csv and read the csv file in a second step?
I also tried pd.read_table() which also need a file or buffer object.
Example data (seperated by tab stops):
Col1 Col2 Col3
12 Info1 34.1
15 Info4 674.1
test = ["Col1\tCol2\tCol3", "12\tInfo1\t34.1","15\tInfo4\t674.1"]
Current workaround:
with open(f'{filepath}tmp.csv', 'w', encoding='UTF8') as f:
[f.write(line + "\n") for line in consolidated_file]
df = pd.read_csv(f'{filepath}tmp.csv', sep='\t', index_col=1 )
import pandas as pd
df = pd.DataFrame([x.split('\t') for x in test])
print(df)
and you want header as your first row then
df.columns = df.iloc[0]
df = df[1:]
It seems simpler to convert it to nested list like in other answer
import pandas as pd
test = ["Col1\tCol2\tCol3", "12\tInfo1\t34.1","15\tInfo4\t674.1"]
data = [line.split('\t') for line in test]
df = pd.DataFrame(data[1:], columns=data[0])
but you can also convert it back to single string (or get it directly form file on socket/network as single string) and then you can use io.BytesIO or io.StringIO to simulate file in memory.
import pandas as pd
import io
test = ["Col1\tCol2\tCol3", "12\tInfo1\t34.1","15\tInfo4\t674.1"]
single_string = "\n".join(test)
file_like_object = io.StringIO(single_string)
df = pd.read_csv(file_like_object, sep='\t')
or shorter
df = pd.read_csv(io.StringIO("\n".join(test)), sep='\t')
This method is popular when you get data from network (socket, web API) as single string or data.

Pandas DataFrame - KeyError: 'date'

For a current project, I am working with a large Pandas DataFrame sourced from a JSON file.
As soon as calling specific objects of the JSON file within Pandas, I am getting key errors such as KeyError: 'date' for line df['date'] = pd.to_datetime(df['date']).
I have already excluded the identifier/object wording as a possible source for the error. Is there any smart tweak to make this code work?
The JSON file has the following structure:
[
{"stock_symbol": "AMG", "date": "2013-01-01", "txt_main": "ABC"}
]
And the corresponding code section looks like this:
import string
import json
import pandas as pd
# Loading and normalising the input file
file = open("sp500.json", "r")
data = json.load(file)
df = pd.json_normalize(data)
df = pd.DataFrame().fillna("")
# Datetime conversion
df['date'] = pd.to_datetime(df['date'])
Take a look at the documentation examples of fillna function fillna function.
By doing df = pd.DataFrame().fillna("") you are overriding your previous df with a new (empty) dataframe. You can just apply it this way: df = df.fillna("").
In [42]: import string
...: import json
...: import pandas as pd
...:
...: # Loading and normalising the input file
...: #file = open("sp500.json", "r")
...: #data = json.load(file)
...: df = pd.json_normalize(a)
...: #df = pd.DataFrame().fillna("")
...:
...: # Datetime conversion
...: df['date'] = pd.to_datetime(df['date'])
In [43]: df
Out[43]:
stock_symbol date txt_main
0 AMG 2013-01-01 ABC
df = pd.DataFrame().fillna("") creates a new empty dataframe and fills "NaN" with empty.
So, change that line to df = df.fillna("")
You are using df = pd.DataFrame().fillna("") which will create a new dataframe and fill an with no value.
Here the old df is replaced by empty dataframe, so there is no column named date. Instead, you can use to fill 'na' values using df.fillna("").
import string
import json
import pandas as pd
# Loading and normalising the input file
file = open("sp500.json", "r")
data = json.load(file)
df = pd.json_normalize(data)
df = df.fillna("")
# Datetime conversion
df['date'] = pd.to_datetime(df['date'])
Thank you

JSON to CSV with Leading Zeros

I'm writing a code to convert JSON to CSV; where i need to retain the leading zeros
I have the file emp.json which has numeric values in tag. eg: 000, 001, etc along with other tags.
import pandas as pd
df = pd.read_json('emp.json')
df.to_csv('test1.csv', index= False)
I get the CSV file but the leading zeros in column are removed.
Convert the data type to be string
import pandas as pd
df = pd.read_json('emp.json',dtype=str)
df.to_csv('test1.csv', index= False)
Another way to do it
import json
import pandas as pd
jsondata = '[{"Code":"001","Description":"Afghanistan"},{"Code":"002","Description":"Albania"}]'
jdata = json.loads(jsondata)
df = pd.DataFrame(jdata)
print (df.T)
df.to_csv('test1.csv', index= False)
Code:https://repl.it/repls/BurdensomeCompassionateCommercialsoftware
Maybe have a dtype argument being object:
import pandas as pd
df = pd.read_json('emp.json',dtype=object)
df.to_csv('test1.csv', index= False)
object is just a synonym of str,
Or you can use str:
import pandas as pd
df = pd.read_json('emp.json',dtype=str)
df.to_csv('test1.csv', index= False)

Function to return stripped dataframe

I have a dataframe from CSV file:
import pandas as pd
filename = 'mike.csv'
main_df = pd.read_csv(filename)
I need a function that will strip all string columns' (there are also numeric columns) contents from whitespaces and then return such stripped dataframe. In the below function, the stripping seems to work fine, but I don't know how to return the stripped dataframe:
def strip_whitespace(dataframe):
dataframe_strings = dataframe.select_dtypes(['object'])
dataframe[dataframe_strings.columns] = dataframe_strings.apply(lambda x: x.str.strip())
return # how to return a stripped dataframe here?
Full code:
import pandas as pd
filename = 'mike.csv'
main_df = pd.read_csv(filename)
def strip_whitespace(dataframe):
dataframe_strings = dataframe.select_dtypes(['object'])
dataframe[dataframe_strings.columns] = dataframe_strings.apply(lambda x: x.str.strip())
return stripped_dataframe # ?
stripped_main_df = strip_whitespace(main_df) # should be stripped df
I believe need parameter skipinitialspace=True in read_csv:
main_df = pd.read_csv(filename, skipinitialspace=True)
And then stripping columns is not necessary.
But if need use your function:
return dataframe

Categories