I am working on a requirement to write my JSON output as [{"x": "MaxTemp", "y": "Temp3pm"}] and my current output looks like [MaxTemp, Temp3pm], so the logis here is, as per the screenshot the first word is X_axis and the second word after comma(,) is y_axis. Below is my code and I have attached the screenshot of the input data.
x_y_data = list(selected_ri['index'])
x_y_data
ini_string = {'Imp_features_selected_x_y':x_y_data}
# printing initial json
ini_string = json.dumps(ini_string)
# converting string to json
final_dictionary = json.loads(ini_string)
you could use str.split to split text by ',' and expand into two columns, for example:
df = df['index'].str.split(',', expand=True)
# then rename column name to x and y
df.columns = ['x', 'y']
then you can convert it into a dict and output as json at last
data = df.to_dict('records')
ini_string = json.dumps(data)
Related
I have downloaded the har file of an interactive chart and have the datapoints in the following format:
'{"x":"2022-03-28T00:00:00Z"', '"value":0.2615}',
'{"x":"2022-03-29T00:00:00Z"', '"value":0.2573}',
'{"x":"2022-03-30T00:00:00Z"', '"value":0.272}', ...
What would be the easiest way to convert this into a pandas dataframe?
Both the date and the value should be columns of the dataframe.
First problem is that every element is in inside ' ' so it treads it as two items/columns but it should treat it as single item/doctionary. It may need to replace ', ' with , to have normal string with JSON which you can conver to Python dictionary using module json
text = open(filename).read()
text = text.replace("', '", ",")
and later you can use io.StringIO() to load it from text.
It needs quotechar="'" to read it correctly
df = pd.read_csv(io.StringIO(text), names=['data', 'other'], quotechar="'")
next you can convert every JSON string to python dictionary
df['data'] = df['data'].apply(json.loads)
and next convert dictionary to pd.Series which you can split to columns
df[['x','value']] = df['data'].apply(pd.Series)
Finally you may remove columns data, other
del df['data']
del df['other']
Full working example
text = """'{"x":"2022-03-28T00:00:00Z"', '"value":0.2615}',
'{"x":"2022-03-29T00:00:00Z"', '"value":0.2573}',
'{"x":"2022-03-30T00:00:00Z"', '"value":0.272}',"""
import pandas as pd
import io
import json
#text = open(filename).read()
text = text.replace("', '", ",")
#print(text)
# read from string
df = pd.read_csv(io.StringIO(text), names=['data', 'other'], quotechar="'")
# convert string to dictionary
df['data'] = df['data'].apply(json.loads)
# split dictionary in separated columns
df[['x','value']] = df['data'].apply(pd.Series)
# remove some columns
del df['data']
del df['other']
print(df)
Result:
x value
0 2022-03-28T00:00:00Z 0.2615
1 2022-03-29T00:00:00Z 0.2573
2 2022-03-30T00:00:00Z 0.2720
You can also write some part in one line
df[['x','value']] = df['data'].apply(lambda item: pd.Series(json.loads(item)))
or split it separatelly (using .str[index] on dictionary)
df['data'] = df['data'].apply(json.loads)
df['x'] = df['data'].str['x']
df['value'] = df['data'].str['value']
BTW:
you may also need to convert x from string to datetime
df['x'] = pd.to_datetime(df['x'])
so what I want to do is to save list of strings into a cell in csv file in python.
when I try saving this using pd.to_csv('file.csv') the output is like this:
date,content
20, "['banana', 'apple',...]"
I want to save the list without apostrophe like this:
date, content
20, "[banana, apple, ...]"
Any help would be thankful.
+
here are some codes I used for generating the list.
abstracts = []
for t in tdf_groupby_date:
nested_list = t[1].to_list() # nested list
flat_list = [item for sublist in nested_list for item in sublist] # flat list
abstracts.append(flat_list)
t is like this
+
I solved this issue.
I converted the list into string type and deleted the apostophe and others.
The suggestions you gave were right but my problem here was that doesn't convert still.
so I used another function for converting and it worked!
here are some codes I used just FYI in the future.
\
def convert_list_to_str(data):
data = str(data)
data = data.replace("'",'')
data = data.replace(' ','')
data = data.replace('[','')
data = data.replace(']','')
return data
df['abstracts'] = df.abstracts.apply(convert_list_to_str)
I suggest the string replace function from pandas:
import pandas as pd
df = pd.DataFrame({"date": [20,], "content": ["['banana', 'apple']", ]})
df
date content
0 20 ['banana', 'apple']
df.content = df.content.str.replace("'", "")
df
date content
0 20 [banana, apple]
df.to_csv("file.csv")
Edit: seems like I forgot that content is not a normal variable. It should be as below.
df["content"] = str(df["content"]).replace("'","")
Try this before you do pd.to_csv('file.csv'):
content = str(content).replace("'","")
Hello, I want to put each "0" in excel cells. How can I do this?
This is my python code for convert json to Dataframe and to excel
with urllib.request.urlopen(myurl.json") as url:
data = json.loads(url.read().decode())
df = pd.DataFrame(data)
df1 = df.stack().swaplevel()
And this is the output image.
Output image
This is how I want to do
Image, how ı want to do
And this is my json file
{"ADMİN":{"MUSTAFA SEMİH YAMAN":[null,0,0,0,0,1,0,0,1,0,0,1,1,1,1,0,0,1,0,1,0,0,1,0,1,1,1,1,1,1,0,0]}}
Assuming there will always be only one key/value pair, the following piece of code should work.
data = json.loads(url.read().decode())
role = list(data.keys())[0]
name = list(data[role].keys())[0]
listData = data[role][name]
columns = ['', '']+list(range(0, len(listData)))
df = pd.DataFrame([role, name]+listData).T #transform the dataframe
df.columns = columns
df.to_csv("result.csv",index=False)
I have a for loop gets datas from a website and would like to export it to xlsx or csv file.
Normally when I print result of loop I can get all list but when I export that to xlsx file only get last item. Where is the problem can you help?
for item1 in spec:
spec2 = item1.find_all('th')
expl2 = item1.find_all('td')
spec2x = spec2[a].text
expl2x = expl2[a].text
yazim = spec2x + ': ' + expl2x
cumle = yazim
patern = r"(Brand|Series|Model|Operating System|CPU|Screen|MemoryStorage|Graphics Card|Video Memory|Dimensions|Screen Size|Touchscreen|Display Type|Resolution|GPU|Video Memory|Graphic Type|SSD|Bluetooth|USB)"
if re.search(patern, cumle):
speclist = translator.translate(cumle, lang_tgt='tr')
specl = speclist
#print(specl)
import pandas as pd
exp = [{ 'Prospec': specl,},]
df = pd.DataFrame(exp, columns = ['Prospec',])
df.to_excel('output1.xlsx',)
Create an empty list and, at each iteration in your for loop, append a data frame to the list. You will end up with a list of data frames. After the loop, use pd.concat() to create a new data frame by concatenating every element of your list. You can then save the resulting df to an excel file.
Your code would look something like this:
import pandas as pd
df_list = []
for item1 in spec:
......
if re.search(patern, cumle):
....
df_list.append(pd.DataFrame(.....))
df = pd.concat(df_list)
df.to_excel(.....)
I'm trying to split some data from a GPS module. The module prints GPS coordinates using multiple types. I need to be able to split the data type this starting with $ from the integers/other string later in that row.
#read in data
data = pd.read_fwf('/home/harry/Desktop/catTest')
#convert to csv file
data.to_csv('GPS.csv')
X = pd.read_csv('GPS.csv')
#Keep all values
GPS = X.iloc[:].values
#Test on random string
Test_string = GPS[5,:]
#seperate string and int
result = [x.strip() for x in Test_string.split(',')]
print(Test_string)
print(result)
AttributeError: 'numpy.ndarray' object has no attribute 'split'
I want to print each item in the row on seperate rows.
How can I fix this?
This is what the 5th row item looks like when printed.
[5 '$GPTXT,01,01,02,LLC FFFFFFFF-FFFFFFFF-FFFFFFFF-FFFFFFFF-FFFFFFFD*2C']
data = pd.read_fwf('/home/harry/Desktop/catTest')
data.to_csv('GPS.csv')
X = pd.read_csv('GPS.csv')
#Keep all values
GPS = X.iloc[:].values
#Test on random string
Test_string = GPS[5,:]
#seperate string and int
result = Test_string.apply(lambda x: x.split())
print(Test_string)
print(result)
you are selecting ndarray while spliting data try selecting single data while spliting data using lambda