How to format strings differently in pandas python? - python

I have some base.
df = pd.DataFrame([
[time.strftime("%Y-%m-%d", time.gmtime(1611161411.46177)),405.52,39,46,633],
[time.strftime("%Y-%m-%d", time.gmtime(1611161911.46177)),406.52,41,103,582],
[time.strftime("%Y-%m-%d", time.gmtime(1611161911.46177)),406.52,41,146,544],
[time.strftime("%Y-%m-%d", time.gmtime(1611161911.46177)),406.52,41,164,532]], columns=['Date','Balance',"In sell","Quantity","Profit"])
this is what it looks like :
I want to apply to each row:
df = df.style.bar()
This is how I would like to see my final table:
only with formatting of all rows. I ask for your help in this matter.

try:
df.style.bar(subset=pd.IndexSlice[1:2, ['Quantity', 'Profit']], align='mid', color=['#5fba7d'])

Related

How do I capture the properties I want from a string?

I hope you are well I have the following string:
"{\"code\":0,\"description\":\"Done\",\"response\":{\"id\":\"8-717-2346\",\"idType\":\"CIP\",\"suscriptionId\":\"92118213\"},....\"childProducts\":[]}}"...
To which I'm trying to capture the attributes: id, idType and subscriptionId and map them as a dataframe, but the entire body of the .cvs puts it in a single row so it is almost impossible for me to work without index
desired output:
id, idType, suscriptionID
0. '7-84-1811', 'CIP', 21312421412
1. '1-232-42', 'IO' , 21421e324
My code:
import pandas as pd
import json
path = '/example.csv'
df = pd.read_csv(path)
normalize_df = json.load(df)
print(df)
Considering your string is in JSON format, you can do this.
drop columns, transpose, and get headers right.
toEscape = "{\"code\":0,\"description\":\"Done\",\"response\":{\"id\":\"8-717-2346\",\"idType\":\"CIP\",\"suscriptionId\":\"92118213\"}}"
json_string = toEscape.encode('utf-8').decode('unicode_escape')
df = pd.read_json(json_string)
df = df.drop(["code","description"], axis=1)
df = df.transpose().reset_index().drop("index", axis=1)
df.to_csv("user_details.csv")
the output looks like this:
id idType suscriptionId
0 8-717-2346 CIP 92118213
Thank you for the question.

Fetching 2 columns from Tuple using Python

I have a tuple which looks like this when I iterate through its rows:
for row in df.itertuples(index=False, name=None):
print(row)
o/p :
(100214, '120.6843686', '-41.9098438')
(101105, '121.7692179', '-42.2737880')
(101847, '122.6417215', '-43.8718865')
Output Desired:
('120.6843686', '-41.9098438')
('121.7692179', '-42.2737880')
('122.6417215', '-43.8718865')
I am new to Python, so any help would really be appreciated.
Thanks..
Use the following code:
for row in df.itertuples(index=False, name=None):
print(row[1:])
This slices the tuple and displays everything after column 0. This article explains it in further detail if you're interested.
If you are just trying to get values here's a simple way:
import pandas as pd
df = pd.DataFrame((
(100214, '120.6843686', '-41.9098438'),
(101105, '121.7692179', '-42.2737880'),
(101847, '122.6417215', '-43.8718865'))
)
df = df.iloc[:, 1:].values.tolist()
print(df)
[['120.6843686', '-41.9098438'],
['121.7692179', '-42.2737880'],
['122.6417215', '-43.8718865']]

removing rows with given criteria

I am beginer with both python and pandas and I came across an issue I can't handle on my own.
What I am trying to do is:
1) remove all the columns except three that I am interested in
2) remove all rows which contains serveral strings in column "asset number". And here is difficult part. I removed all the blanks but I can't remove other ones because nothing happens (example with string "TECHNOLOGIES" - tried part of the word and whole word and both don't work.
Here is the code:
import modin.pandas as pd
File1 = 'abi.xlsx'
df = pd.read_excel(File1, sheet_name = 'US JERL Dec-19')
df = df[['asset number','Cost','accumulated depr']] #removing other columns
df = df.dropna(axis=0, how='any', thresh=None, subset=None, inplace = False)
df = df[~df['asset number'].str.contains("TECHNOLOGIES, INC", na=False)]
df.to_excel("abi_output.xlsx")
And besides that, file has 600k rows and it loads so slow to see the output. Do you have any advice for it?
Thank you!
#Kenan - thank you for your answer. Now the code looks like below but it still doesn't remove rows which contains in chosen column specified stirngs. I also attached screenshot of the output to show you that the rows still exist. Any thoughts?
import modin.pandas as pd
File1 = 'abi.xlsx'
df = pd.read_excel(File1, sheet_name = 'US JERL Dec-19', usecols=['asset number','Cost','accumulated depr'])
several_strings = ['', 'TECHNOLOGIES', 'COST CENTER', 'Account', '/16']
df = df[~df['asset number'].isin(several_strings)]
df.to_excel("abi_output.xlsx")
rows still are not deleted
#Andy
I attach sample of the input file. I just changed the numbers in two columns because these are confidential and removed not needed columns (removing them with code wasn't a problem).
Here is the link. Let me know if this is not working properly.
enter link description here
You can combine your first two steps with:
df = pd.read_excel(File1, sheet_name = 'US JERL Dec-19', usecols=['asset number','Cost','accumulated depr'])
I assume this is what your trying to remove
several_strings = ['TECHNOLOGIES, INC','blah','blah']
df = df[~df['asset number'].isin(several_string)]
df.to_excel("abi_output.xlsx")
Update
Based on the link you provided this might be a better approach
df = df[df['asset number'].str.len().eq(7)]
the code your given is correct. so I guess maybe there is something wrong with your strings in columns 'asset number', can you give some examples for a code check?

Split data separated by comma into lists in python

my dataframe df looks like this
Row_ID Codes
=============
1 A123,B456,C678
2 X359,C678,F23
3 J3,D24,J36,K994
I want to put all Codes in a list
something like this
['A123', 'B456', 'C678'],['X359', 'C678', 'F23'], ['J3', 'D24', 'J36', 'K994']
I did this
# an empty list
CodeList = []
for i in df['Codes']:
CodeList.append(list(i))
but what I get is this
['A','1','2','3','B'....
How can I do it the right way as mentioned above?
import pandas as pd
data = {"Codes": ["A123, B456, C678", "X359, C678, F23", "J3, D24, J36, K994"]}
df = pd.DataFrame(data)
result = [a.split(", ") for a in df["Codes"]]
print(result)
output
[['A123', 'B456', 'C678'], ['X359', 'C678', 'F23'], ['J3', 'D24', 'J36', 'K994']]
Try spliting using the following:
CodeList.append(i.split(','))
It seems like many of the other answers here might just be plain wrong. (Edit: Currently, they all are)
This code does work:
import pandas as pd
data = {'Codes': ['A123,B456,C678', 'X359,C678,F23', 'J3,D24,J36,K994']}
df = pd.DataFrame(data)
codes_list = df['Codes'].str.split(',').tolist()
codes_list looks like:
[['A123', 'B456', 'C678'], ['X359', 'C678', 'F23'], ['J3', 'D24', 'J36', 'K994']]
Note that this solution is idiomatic Pandas, whereas explicit loops should be avoided whenever possible.
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(3, 2), columns=list('AB'))
print(df.head())
print(df.values.tolist())
output:
[[-0.2645782053241853, 0.5022937587041725], [1.624868960959602, 0.5086915380333786], [1.3593608874498997, 0.7077939622903995]]
Just remove the list from line CodeList.append(list(i))
CodeList = []
for i in df['Codes']:
CodeList.append(i.split(','))

DataFrame filter data with string type use like

I have a dataframe like this
block_name
['循环经济']
['3D打印']
['再生经济']
Now I want get the data with block_name contains '经济' words.
The result that I want is:
block_name
['循环经济']
['再生经济']
And I tried this:
df = df[('经济' in df['block_name'])]
And this:
df = df[(df['block_name'].find('经济') != -1)]
But they don't work.
How should I do this result like the SQL's like "%经济%"?
Use .str.contains()
import pandas as pd
df = pd.DataFrame(['循环经济', '3D打印', '再生经济'], columns=['block_name'])
print df[df['block_name'].str.contains('经济')]

Categories