How do I capture the properties I want from a string? - python

I hope you are well I have the following string:
"{\"code\":0,\"description\":\"Done\",\"response\":{\"id\":\"8-717-2346\",\"idType\":\"CIP\",\"suscriptionId\":\"92118213\"},....\"childProducts\":[]}}"...
To which I'm trying to capture the attributes: id, idType and subscriptionId and map them as a dataframe, but the entire body of the .cvs puts it in a single row so it is almost impossible for me to work without index
desired output:
id, idType, suscriptionID
0. '7-84-1811', 'CIP', 21312421412
1. '1-232-42', 'IO' , 21421e324
My code:
import pandas as pd
import json
path = '/example.csv'
df = pd.read_csv(path)
normalize_df = json.load(df)
print(df)

Considering your string is in JSON format, you can do this.
drop columns, transpose, and get headers right.
toEscape = "{\"code\":0,\"description\":\"Done\",\"response\":{\"id\":\"8-717-2346\",\"idType\":\"CIP\",\"suscriptionId\":\"92118213\"}}"
json_string = toEscape.encode('utf-8').decode('unicode_escape')
df = pd.read_json(json_string)
df = df.drop(["code","description"], axis=1)
df = df.transpose().reset_index().drop("index", axis=1)
df.to_csv("user_details.csv")
the output looks like this:
id idType suscriptionId
0 8-717-2346 CIP 92118213
Thank you for the question.

Related

PYTHON, Pandas Dataframe: how to select and read only certain rows

For the purpose to be clear here is the code that works perfectly (of course I put only the beginning, the rest is not important here):
df = pd.read_csv(
'https://github.com/pcm-dpc/COVID-19/raw/master/dati-andamento-nazionale/'
'dpc-covid19-ita-andamento-nazionale.csv',
parse_dates=['data'], index_col='data')
df.index = df.index.normalize()
ts = df[['nuovi_positivi']].dropna()
sts = ts.nuovi_positivi
So basically it takes some data from the online Github csv that you may find here:
Link NAZIONALE and look at the "data" which is the italian for "date" and extract for every date the value nuovi_positivi and then it put it into the program.
Now I have to do the same thing with this json that you may find here
Link Json
As you may see, now for every date there are 21 different values because Italy has 21 regions (Abruzzo Basilicata Campania and so on) but I am interested ONLY with the values of the region "Veneto", and I want to extract only the rows that contains "Veneto" under the label "denominazione_regione" to get for every day the value "nuovi_positivi".
I tried with:
df = pd.read_json('https://raw.githubusercontent.com/pcm-dpc/COVID-19/master/dati-
json/dpc-covid19-ita-regioni.json' , parse_dates=['data'], index_col='data',
index_row='Veneto')
df.index = df.index.normalize()
ts = df[['nuovi_positivi']].dropna()
sts = ts.nuovi_positivi
but of course it doesn't work. How to solve the problem? Thanks
try this:
df = pd.read_json('https://raw.githubusercontent.com/pcm-dpc/COVID-19/master/dati-json/dpc-covid19-ita-regioni.json',
convert_dates =['data'])
df.index = df['data']
df.index = df.index.normalize()
df = df[df["denominazione_regione"] == 'Veneto']
ts = df[['nuovi_positivi']].dropna()
sts = ts.nuovi_positivi

Is there any method to replace specific data from column without breaking its structure or spliting

Hi there i am trying to figure out how to replace a specific data of csv file. i have a file which is base or location data of id's.
https://store8.gofile.io/download/5b031959-e0b0-4dbf-aec6-264e0b87fd09/service%20block.xlsx (sheet 2 had data ).
The file which i want to replace data using id is below
https://store8.gofile.io/download/6e13a19a-bac8-4d16-8692-e4435eed2a08/Serp.csv
Highlighted part need to be deleted after filling location.
import pandas as pd
df1= pd.read_excel("serp.xlsx", header=None)
df2= pd.read_excel("flocnam.xlsx", header=None)
df1 = df1[0].str.split(";", expand=True)
df1[4] = df1[4].apply(lambda x: v[-1] if (v := x.split()) else "")
df2[1] = df2[1].apply(lambda x: x.split("-")[0])
m = dict(zip(df2[1], df2[0]))
df1[4]= df1[4].replace(m)
print(df1)
df1.to_csv ("test.csv")
It worked but not how i wanted.
https://store8.gofile.io/download/c0ae7e05-c0e2-4f43-9d13-da12ddf73a8d/test.csv
trying to replace it like this.(desired output)
Thank you for being Supportive community❤️
If I understand correctly, you simply need to specify the separator ;
>>> df.to_csv(‘test.csv’, sep=‘;’, index_label=False)

Python, extracting a string between two specific characters for all rows in a dataframe

I am currently trying to write a function that will extract the string between 2 specific characters.
My data set contains emails only, that look like this: pstroulgerrn#time.com.
I am trying to extract everything after the # and everything before the . so that the email listed above would output time.
Here is my code so far :
new = df_personal['email'] # 1000x1 dataframe of emails
def extract_company(x):
y = [ ]
y = x[x.find('#')+1 : x.find('.')]
return y
extract_company(new)
Note : If I change new to df_personal['email'][0] the correct output is displayed for that row.
However, when trying to do it for the entire dataframe, I get an error saying :
AttributeError: 'Series' object has no attribute 'find'
You can extract a series of all matching texts using regex:
import pandas as pd
df = pd.DataFrame( ['kabawonga#something.whereever','kabawonga#omg.whatever'])
df.columns = ['email']
print(df)
k = df["email"].str.extract(r"#(.+)\.")
print(k)
Output:
# df
email
0 kabawonga#something.whereever
1 kabawonga#omg.whatever
# extraction
0
0 something
1 omg
See pandas.Series.str.extract
Try:
df_personal["domain"]=df_personal["email"].str.extract(r"\#([^\.]+)\.")
Outputs (for the sample data):
import pandas as pd
df_personal=pd.DataFrame({"email": ["abc#yahoo.com", "xyz.abc#gmail.com", "john.doe#aol.co.uk"]})
df_personal["domain"]=df_personal["email"].str.extract(r"\#([^\.]+)\.")
>>> df_personal
email domain
0 abc#yahoo.com yahoo
1 xyz.abc#gmail.com gmail
2 john.doe#aol.co.uk aol
You can do it with an apply function, by first splitting by a . and then by # for each of the row:
Snippet:
import pandas as pd
df = pd.DataFrame( ['abc#xyz.dot','def#qwe.dot','def#ert.dot.dot'])
df.columns = ['email']
df["domain"] = df["email"].apply(lambda x: x.split(".")[0].split("#")[1])
Output:
df
Out[37]:
email domain
0 abc#xyz.dot xyz
1 def#qwe.dot qwe
2 def#ert.dot.dot ert

overwriting the values of a column in python

I was trying to modify each string present in column named Date_time in a data-frame. The values(string type) present in that column is as:
"40 11-02-20 11:42:36"
I was trying to delete the characters until first space and trying to replace it with: "11-02-20 11:42:36". I was able to split the value but unable to rewrite it in the same cell of that column. Here is the code i have done so far:
import numpy as np
import matplotlib as plt
import pandas as pd
dataset = pd.read_csv('20-02-11.csv')
for i in dataset.itertuples():
print(type(i.Date_time))
str = i.Date_time
str1 = str.split(None,1)[1]
i.Date_time = str1
print(str1)
print(i.Date_time)
break
and it shows AttributeError when i am trying to assign str1 to i.Date_time.
Please help.
The tuples that itertuples() returns, can/should not be used to set values in the original dataframe. They are copies not the actual data of the dataframe. You can try something like this:
for i in range(len(dataset)):
your_string = dataset.loc[i, "Date_time"]
adjusted_string = your_string.split(None, 1)[1]
dataset.loc[i, "Date_time"] = adjusted_string
This will use the actual data stored in the dataframe.
Using the df.at()-function:
for i, row in dataset.iterrows():
your_string = row.Date_time # or row['Date_time']
adjusted_string = your_string.split(None, 1)[1]
dataset.at[i,'Date_time'] = adjusted_string
You can format the entire column at once. Starting with a dataframe like this:
df = pd.DataFrame({'date_time': ['40 11-02-20 11:42:36', '31 11-02-20 11:42:36']})
print(df)
returns
date_time
0 40 11-02-20 11:42:36
1 31 11-02-20 11:42:36
You can remove the first characters and space like this:
df['date_time'] = [i[1+len(i.split(' ')[0]):] for i in df['date_time']]
print(df)
returns
date_time
0 11-02-20 11:42:36
1 11-02-20 11:42:36

DataFrame filter data with string type use like

I have a dataframe like this
block_name
['循环经济']
['3D打印']
['再生经济']
Now I want get the data with block_name contains '经济' words.
The result that I want is:
block_name
['循环经济']
['再生经济']
And I tried this:
df = df[('经济' in df['block_name'])]
And this:
df = df[(df['block_name'].find('经济') != -1)]
But they don't work.
How should I do this result like the SQL's like "%经济%"?
Use .str.contains()
import pandas as pd
df = pd.DataFrame(['循环经济', '3D打印', '再生经济'], columns=['block_name'])
print df[df['block_name'].str.contains('经济')]

Categories