Trying to split output by ','

Trying to split output by ',' - python

I have an object for my output. Now I want to split my output and
create a df with the values.
This is the output I work with:
Seriennummer
701085.0 ([1525.5804581812297, 255.9005481721001, 0.596...
701086.0 ([1193.0420594479258, 271.17468806239793, 0.65...
701087.0 ([1265.5151604213813, 217.26487934586433, 0.60...
701088.0 ([1535.8282855508626, 200.6196628705149, 0.548...
701089.0 ([1500.4964672930257, 247.8883736673866, 0.583...
701090.0 ([1203.6453723293514, 258.5749562983118, 0.638...
701091.0 ([1607.1851164005993, 209.82194423587782, 0.56...
701092.0 ([1711.7277933836879, 231.1560159770871, 0.567...
dtype: object
This is what I am doing and my attempt to split my output:
x=df.T.iloc[1]
y=df.T.iloc[2]
def logifunc(x,c,a,b):
return c / (1 + (a) * np.exp(-b*(x)))
result = df.groupby('Seriennummer').apply(lambda grp:
opt.curve_fit(logifunc, grp.mrwSmpVWi, grp.mrwSmpP, p0=[110, 400, -2]))
print(result)
for element in result:
parts = element.split(',')
print (parts)
It doesn't work. I get the Error:
AttributeError: 'tuple' object has no attribute 'split'
#jezrael
It works. Now it shows a lot of data I don't need. Do you have an idea how I can drop every with the data I don't need.
Seriennummer 0 1 2
701085.0 1525.5804581812297 255.9005481721001 0.5969011082719918
701085.0 [ 9.41414894e+03 -2.07982124e+03 -2.30130078e+00] [-2.07982124e+03 1.44373786e+03 9.59282709e-01] [-2.30130078e+00 9.59282709e-01 7.75807643e-04]
701086.0 1193.0420594479258 271.17468806239793 0.6592054681687264
701086.0 [ 5.21906135e+03 -2.23855187e+03 -2.11896425e+00] [-2.23855187e+03 2.61036500e+03 1.67396324e+00] [-2.11896425e+00 1.67396324e+00 1.22581746e-03]
701087.0 1265.5151604213813 217.26487934586433 0.607183527397275

Use Series.explode with DataFrame constructor:
s = result.explode()
df1 = pd.DataFrame(s.tolist(), index=s.index)
If small data and/or performnace is not important:
df1 = result.explode().apply(pd.Series)

Related

Python dataframes - grouping series

I'm trying to execute a filter in python, but I'm stuck at the end, when I need to group the resullt.
I have a json, which is this one: https://api.jsonbin.io/b/62300664a703bb67492bd3fc/3
And what I'm trying to do with it is filtering "apiFamily" searching for "payments-ted" or "payments-doc". If I find a match, I then must verify that the column "ApiEndpoints" has at least two endpoints in it.
My ultimate goal is to append both "apiFamily" in one row and all the ApiEndpoints" in another row. Something like this:
"ApiFamily": [
"payments-ted",
"payments-doc"
]
"ApiEndpoints": [
"/ted",
"/electronic-ted",
"/phone-ted",
"/banking-ted",
"/shared-automated-teller-machines-ted"
"/doc",
"/electronic-doc",
"/phone-doc",
"/banking-doc",
"/shared-automated-teller-machines-doc"
]
I have managed so achieve partial sucess, searching for a single condition:
#ApiFilter = df[(df['ApiFamily'] == 'payments-pix') & (rolesFilter['ApiEndpoints'].apply(lambda x: len(x)) >= 2)]
This obviously extracts only payments-pix which contains two or more ApiEndpoints.
Now I can manage to check both conditions, if I try this:
#ApiFilter = df[((df['ApiFamily'] == 'payments-ted') | (df['ApiFamily'] == 'payments-doc') &(df['ApiEndpoints'].apply(lambda x: len(x)) >= 2)]
I will get the correct rows, but it will obviously list the brand twice.
When I try to groupby the result, all I get is this:
TypeError: unhashable type: 'Series'
My doubt is: how to avoid this error? I assume I must do some sort of conversion of the columns that have multiple itens inside a row, but what is the best method?

I have tried this solution , it is kind of round-about but gets the final result you want
First get the data into a dictionary object
>>> import requests
>>> url = 'https://api.jsonbin.io/b/62300664a703bb67492bd3fc/3'
>>> response = requests.get(url)
>>> d = response.json()
We just need the ApiFamily and ApiEndpoints into a new dictionary
>>> dNew = {}
>>> for item in d['data'] :
>>> if item['ApiFamily'] in ['payments-ted','payments-doc']:
>>> dNew[item['ApiFamily']] = item['ApiEndpoints']
Change dNew into a dataframe and transpose it.
>>> df1 = pd.DataFrame(dNew)
>>> df1 = df1.applymap ( lambda x : '\'' + x + '\'')
>>> df2 = df1.transpose()
At this stage df2 looks like this -
>>> print(df2)
0 1 2 3 \
payments-ted '/ted' '/electronic-ted' '/phone-ted' '/banking-ted'
payments-doc '/doc' '/electronic-doc' '/phone-doc' '/banking-doc'
4
payments-ted '/shared-automated-teller-machines-ted'
payments-doc '/shared-automated-teller-machines-doc'
Now join all the columns using the comma symbol
>>> df2['final'] = df2.apply( ','.join , axis=1)
Finally
>>> df2 = df2[['final']]
>>> print(df2)
final
payments-ted '/ted','/electronic-ted','/phone-ted','/bankin...
payments-doc '/doc','/electronic-doc','/phone-doc','/bankin...

Rows are Int64Index, python states they are string and dataframe

I formed a dataset, where the two columns are edge lists taken from another dataset and I mistakenly formed one of the columns as an Int64Index type when extracting the indexes as pictured here
I am trying to extract the numbers from each cell, but run into problems. When I try to handle the number as a string using the int() command, I get an error that 'DataFrame' object is not callable. However when I try to use a pandas dataframe commmand, such as to_numeric(), I get a AttributeError: 'str' object has no attribute 'to_numeric'.
df1 = pd.DataFrame(np.array(["boo","foo","bar"]),columns=['col1'])
d = {'col1': ["boo","boo","boo","bar","foo","bar"], 'Title': ["no","yes","stop","yes","stop","go"],'Example': ["p","y","x","f","v","g"] }
df2 = pd.DataFrame(data=d)
d = {'Example': ["p","y","x","f","v","g"], 'Title': ["no","yes","stop","yes","stop","go"]}
df3 = pd.DataFrame(data=d)
for i in range(0,len(df1)):
val = df1['col1'][i]
stuff = df2[df2["col1"] == val]
stuff = stuff.reset_index()
for k in range(0,len(stuff)):
s = stuff["Example"][k]
p = stuff["Title"][k]
j = df2.index[(df2['Example'] == s) & (df2['Title'] == p)]
edges = edges.append({'source': i, 'target': j}, ignore_index=True)
m = edges['target'].tolist()[3]
print(m)
output
pattern = re.compile("[\d*]")
print(type("".join(pattern.findall(m)[2:-2])))
"".join(pattern.findall(m)[2:-2])
output
int("".join(pattern.findall(m)[2:-2]))
output

Index to column

I tried to convert my Index to a column. But I get the Error: AttributeError: 'DataFrame' object has no attribute 'reset_Seriennummer' It should be simple but it doesn't work.
My Index is ot called Index but it is written the same way:
My df:
Seriennummer 0
701085.0 "(array([1.52558046e+03, 2.55900548e+02, 5.96901108e-01]), array([[ 9.41414894e+03, -2.07982124e+03, -2.30130078e+00],
[-2.07982124e+03, 1.44373786e+03, 9.59282709e-01],
[-2.30130078e+00, 9.59282709e-01, 7.75807643e-04]]))"
701086.0 "(array([1.19304206e+03, 2.71174688e+02, 6.59205468e-01]), array([[ 5.21906135e+03, -2.23855187e+03, -2.11896425e+00],
[-2.23855187e+03, 2.61036500e+03, 1.67396324e+00],
[-2.11896425e+00, 1.67396324e+00, 1.22581746e-03]]))"
What I tried so far:
df['Seriennummer'] = df.Seriennummer
or
df.reset_Seriennummer(level=0, inplace=True)

This will work:
df.reset_index(level='Seriennummer')

Read and split a column values from a dataframe

I have a dataset, where the second column looks like this.
FileName
892e7c8382943342a29a6ae5a55f2272532d8e04.exe.asm
2d42c1b2c33a440d165683eeeec341ebf61218a1.exe.asm
1fbab6b4566a2465a8668bbfed21c0bfaa2c2eed.exe.asm
Now, I want to extract the name before ".exe.asm" from the column and append it to a new list for all the rows of my dataset. I tried the following code:
import pandas as pd
df = pd.read_csv("dataset1.csv")
exekey = []
for row in df.iterrows():
exekey.append(row[1].split('.'))
exekey
This execution gave me the following error:
AttributeError: 'Series' object has no attribute 'split'
I am not able to do it. Please help
On changing, the output was of the form Output image

Split the filename using . and access 1st element using indexing.
import pandas as pd
df = pd.DataFrame({'FileName':['892e7c8382943342a29a6ae5a55f2272532d8e04.exe.asm',
'2d42c1b2c33a440d165683eeeec341ebf61218a1.exe.asm',
'1fbab6b4566a2465a8668bbfed21c0bfaa2c2eed.exe.asm']})
exekey = [i.split(".")[0] for i in df['FileName']]
print(exekey)
Alternate way:
exekey2 = df['FileName'].apply(lambda x: x.split(".")[0]).tolist()
Output:
['892e7c8382943342a29a6ae5a55f2272532d8e04', '2d42c1b2c33a440d165683eeeec341ebf61218a1', '1fbab6b4566a2465a8668bbfed21c0bfaa2c2eed']

You can use map like this to split on . and take index 0,
df['FileName'].map(lambda f : f.split('.')[0])
# Output
0 892e7c8382943342a29a6ae5a55f2272532d8e04
1 2d42c1b2c33a440d165683eeeec341ebf61218a1
2 1fbab6b4566a2465a8668bbfed21c0bfaa2c2eed
Name: FileName, dtype: object
If you want to get a list of names you can do,
df['FileName'].map(lambda f : f.split('.')[0]).values.tolist()
# Output : ['892e7c8382943342a29a6ae5a55f2272532d8e04',
'2d42c1b2c33a440d165683eeeec341ebf61218a1',
'1fbab6b4566a2465a8668bbfed21c0bfaa2c2eed']

Replace None with NaN and ignore NoneType in Pandas

I'm attempting to create a raw string variable from a pandas dataframe, which will eventually be written to a .cfg file, by firstly joining two columns together as shown below and avoiding None:
Section of df:
command value
...
439 sensitivity "0.9"
440 cl_teamid_overhead_always 1
441 host_writeconfig None
...
code:
...
df = df['value'].replace('None', np.nan, inplace=True)
print df
df = df['command'].astype(str)+' '+df['value'].astype(str)
print df
cfg_output = '\n'.join(df.tolist())
print cfg_output
I've attempted to replace all the None values with NaN firstly so that no lines in cfg_output contain "None" as part of of the string. However, by doing so I seem to get a few undesired results. I made use of print statements to see what is going on.
It seems that df = df['value'].replace('None', np.nan, inplace=True), simply outputs None.
It seems that df = df['command'].astype(str)+' '+df['value'].astype(str) and cfg_output = '\n'.join(df.tolist()), cause the following error:
TypeError: 'NoneType' object has no attribute '__getitem__'
Therefore, I was thinking that by ignoring any occurrences of NaN, the code may run smoothly, although I'm unsure about how to do so using Pandas
Ultimately, my desired output would be as followed:
sensitivity "0.9"
cl_teamid_overhead_always 1
host_writeconfig

First of all, df['value'].replace('None', np.nan, inplace=True) returns None because you're calling the method with the inplace=True argument. This argument tells replace to not return anything but instead modify the original dataframe as it is. Similar to how pop or append work on lists.
With that being said, you can also get the desired output calling fillna with an empty string:
import pandas as pd
import numpy as np
d = {
'command': ['sensitivity', 'cl_teamid_overhead_always', 'host_writeconfig'],
'value': ['0.9', 1, None]
}
df = pd.DataFrame(d)
# df['value'].replace('None', np.nan, inplace=True)
df = df['command'].astype(str) + ' ' + df['value'].fillna('').astype(str)
cfg_output = '\n'.join(df.tolist())
>>> print(cfg_output)
sensitivity 0.9
cl_teamid_overhead_always 1
host_writeconfig

You can replace None to ''
df=df.replace('None','')
df['command'].astype(str)+' '+df['value'].astype(str)
Out[436]:
439 sensitivity 0.9
440 cl_teamid_overhead_always 1
441 host_writeconfig
dtype: object

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Trying to split output by ',' - python

Use Series.explode with DataFrame constructor: s = result.explode() df1 = pd.DataFrame(s.tolist(), index=s.index) If small data and/or performnace is not important: df1 = result.explode().apply(pd.Series)

Related

Python dataframes - grouping series

Rows are Int64Index, python states they are string and dataframe

Index to column

Read and split a column values from a dataframe

Replace None with NaN and ignore NoneType in Pandas

Categories

Resources