Fetching 2 columns from Tuple using Python - python

I have a tuple which looks like this when I iterate through its rows:
for row in df.itertuples(index=False, name=None):
print(row)
o/p :
(100214, '120.6843686', '-41.9098438')
(101105, '121.7692179', '-42.2737880')
(101847, '122.6417215', '-43.8718865')
Output Desired:
('120.6843686', '-41.9098438')
('121.7692179', '-42.2737880')
('122.6417215', '-43.8718865')
I am new to Python, so any help would really be appreciated.
Thanks..

Use the following code:
for row in df.itertuples(index=False, name=None):
print(row[1:])
This slices the tuple and displays everything after column 0. This article explains it in further detail if you're interested.

If you are just trying to get values here's a simple way:
import pandas as pd
df = pd.DataFrame((
(100214, '120.6843686', '-41.9098438'),
(101105, '121.7692179', '-42.2737880'),
(101847, '122.6417215', '-43.8718865'))
)
df = df.iloc[:, 1:].values.tolist()
print(df)
[['120.6843686', '-41.9098438'],
['121.7692179', '-42.2737880'],
['122.6417215', '-43.8718865']]

Related

How do I capture the properties I want from a string?

I hope you are well I have the following string:
"{\"code\":0,\"description\":\"Done\",\"response\":{\"id\":\"8-717-2346\",\"idType\":\"CIP\",\"suscriptionId\":\"92118213\"},....\"childProducts\":[]}}"...
To which I'm trying to capture the attributes: id, idType and subscriptionId and map them as a dataframe, but the entire body of the .cvs puts it in a single row so it is almost impossible for me to work without index
desired output:
id, idType, suscriptionID
0. '7-84-1811', 'CIP', 21312421412
1. '1-232-42', 'IO' , 21421e324
My code:
import pandas as pd
import json
path = '/example.csv'
df = pd.read_csv(path)
normalize_df = json.load(df)
print(df)
Considering your string is in JSON format, you can do this.
drop columns, transpose, and get headers right.
toEscape = "{\"code\":0,\"description\":\"Done\",\"response\":{\"id\":\"8-717-2346\",\"idType\":\"CIP\",\"suscriptionId\":\"92118213\"}}"
json_string = toEscape.encode('utf-8').decode('unicode_escape')
df = pd.read_json(json_string)
df = df.drop(["code","description"], axis=1)
df = df.transpose().reset_index().drop("index", axis=1)
df.to_csv("user_details.csv")
the output looks like this:
id idType suscriptionId
0 8-717-2346 CIP 92118213
Thank you for the question.

How to format strings differently in pandas python?

I have some base.
df = pd.DataFrame([
[time.strftime("%Y-%m-%d", time.gmtime(1611161411.46177)),405.52,39,46,633],
[time.strftime("%Y-%m-%d", time.gmtime(1611161911.46177)),406.52,41,103,582],
[time.strftime("%Y-%m-%d", time.gmtime(1611161911.46177)),406.52,41,146,544],
[time.strftime("%Y-%m-%d", time.gmtime(1611161911.46177)),406.52,41,164,532]], columns=['Date','Balance',"In sell","Quantity","Profit"])
this is what it looks like :
I want to apply to each row:
df = df.style.bar()
This is how I would like to see my final table:
only with formatting of all rows. I ask for your help in this matter.
try:
df.style.bar(subset=pd.IndexSlice[1:2, ['Quantity', 'Profit']], align='mid', color=['#5fba7d'])

Read and split a column values from a dataframe

I have a dataset, where the second column looks like this.
FileName
892e7c8382943342a29a6ae5a55f2272532d8e04.exe.asm
2d42c1b2c33a440d165683eeeec341ebf61218a1.exe.asm
1fbab6b4566a2465a8668bbfed21c0bfaa2c2eed.exe.asm
Now, I want to extract the name before ".exe.asm" from the column and append it to a new list for all the rows of my dataset. I tried the following code:
import pandas as pd
df = pd.read_csv("dataset1.csv")
exekey = []
for row in df.iterrows():
exekey.append(row[1].split('.'))
exekey
This execution gave me the following error:
AttributeError: 'Series' object has no attribute 'split'
I am not able to do it. Please help
On changing, the output was of the form Output image
Split the filename using . and access 1st element using indexing.
import pandas as pd
df = pd.DataFrame({'FileName':['892e7c8382943342a29a6ae5a55f2272532d8e04.exe.asm',
'2d42c1b2c33a440d165683eeeec341ebf61218a1.exe.asm',
'1fbab6b4566a2465a8668bbfed21c0bfaa2c2eed.exe.asm']})
exekey = [i.split(".")[0] for i in df['FileName']]
print(exekey)
Alternate way:
exekey2 = df['FileName'].apply(lambda x: x.split(".")[0]).tolist()
Output:
['892e7c8382943342a29a6ae5a55f2272532d8e04', '2d42c1b2c33a440d165683eeeec341ebf61218a1', '1fbab6b4566a2465a8668bbfed21c0bfaa2c2eed']
You can use map like this to split on . and take index 0,
df['FileName'].map(lambda f : f.split('.')[0])
# Output
0 892e7c8382943342a29a6ae5a55f2272532d8e04
1 2d42c1b2c33a440d165683eeeec341ebf61218a1
2 1fbab6b4566a2465a8668bbfed21c0bfaa2c2eed
Name: FileName, dtype: object
If you want to get a list of names you can do,
df['FileName'].map(lambda f : f.split('.')[0]).values.tolist()
# Output : ['892e7c8382943342a29a6ae5a55f2272532d8e04',
'2d42c1b2c33a440d165683eeeec341ebf61218a1',
'1fbab6b4566a2465a8668bbfed21c0bfaa2c2eed']

pyspark RDD to DataFrame

I am new to Spark.
I have a DataFrame and I used the following command to group it by 'userid'
def test_groupby(df):
return list(df)
high_volumn = self.df.filter(self.df.outmoney >= 1000).rdd.groupBy(
lambda row: row.userid).mapValues(test_groupby)
It gives a RDD which in following structure:
(326033430, [Row(userid=326033430, poiid=u'114233866', _mt_datetime=u'2017-06-01 14:54:48', outmoney=1127.0, partner=2, paytype=u'157', locationcity=u'\u6f4d\u574a', locationprovince=u'\u5c71\u4e1c\u7701', location=None, dt=u'20170601')])
326033430 is the big group.
My question is how can I convert this RDD back to a DataFrame Structure? If I cannot do that, how I can get values from the Row term?
Thank you.
You should just
from pyspark.sql.functions import *
high_volumn = self.df\
.filter(self.df.outmoney >= 1000)\
.groupBy('userid').agg(collect_list('col'))
and in .agg method pass what You want to do with rest of data.
Follow this link : http://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.DataFrame.agg

Exporting Pandas DataFrame with MultiIndex

I have just discovered pandas and am impressed by its capabilities.
I am having difficulties understanding how to work with DataFrame with MultiIndex.
I have two questions :
(1) Exporting the DataFrame
Here my problem:
This dataset
import pandas as pd
import StringIO
d1 = StringIO.StringIO(
"""Gender,Employed,Region,Degree
m,yes,east,ba
m,yes,north,ba
f,yes,south,ba
f,no,east,ba
f,no,east,bsc
m,no,north,bsc
m,yes,south,ma
f,yes,west,phd
m,no,west,phd
m,yes,west,phd """
)
df = pd.read_csv(d1)
# Frequencies tables
tab1 = pd.crosstab(df.Gender, df.Region)
tab2 = pd.crosstab(df.Gender, [df.Region, df.Degree])
tab3 = pd.crosstab([df.Gender, df.Employed], [df.Region, df.Degree])
# Now we export the datasets
tab1.to_excel('H:/test_tab1.xlsx') # OK
tab2.to_excel('H:/test_tab2.xlsx') # fails
tab3.to_excel('H:/test_tab3.xlsx') # fails
One work-around I could think of is to change the columns (The way R does)
def NewColums(DFwithMultiIndex):
NewCol = []
for item in DFwithMultiIndex.columns:
NewCol.append('-'.join(item))
return NewCol
# New Columns
tab2.columns = NewColums(tab2)
tab3.columns = NewColums(tab3)
# New export
tab2.to_excel('H:/test_tab2.xlsx') # OK
tab3.to_excel('H:/test_tab3.xlsx') # OK
My question is : Is there a more efficient way to do this in Pandas that I missed in the documentation ?
2) Selecting columns
This new structure does not allow to select colums on a given variable (the advantage of hierarchical indexing in first place). How can I select columns containing a given string (e.g. '-ba') ?
P.S: I have seen this question which is related but have not understood the reply proposed
This looks like a bug in to_excel, for the moment as a workaround I would recommend using to_csv (which seems not to show this issue).
I added this as an issue on github.
To answer the second question, if you really need to use to_excel...
You can use filter to select only those columns which include '-ba':
In [21]: filter(lambda x: '-ba' in x, tab2.columns)
Out[21]: ['east-ba', 'north-ba', 'south-ba']
In [22]: tab2[filter(lambda x: '-ba' in x, tab2.columns)]
Out[22]:
east-ba north-ba south-ba
Gender
f 1 0 1
m 1 1 0

Categories