Split data separated by comma into lists in python - python

my dataframe df looks like this
Row_ID Codes
=============
1 A123,B456,C678
2 X359,C678,F23
3 J3,D24,J36,K994
I want to put all Codes in a list
something like this
['A123', 'B456', 'C678'],['X359', 'C678', 'F23'], ['J3', 'D24', 'J36', 'K994']
I did this
# an empty list
CodeList = []
for i in df['Codes']:
CodeList.append(list(i))
but what I get is this
['A','1','2','3','B'....
How can I do it the right way as mentioned above?

import pandas as pd
data = {"Codes": ["A123, B456, C678", "X359, C678, F23", "J3, D24, J36, K994"]}
df = pd.DataFrame(data)
result = [a.split(", ") for a in df["Codes"]]
print(result)
output
[['A123', 'B456', 'C678'], ['X359', 'C678', 'F23'], ['J3', 'D24', 'J36', 'K994']]

Try spliting using the following:
CodeList.append(i.split(','))

It seems like many of the other answers here might just be plain wrong. (Edit: Currently, they all are)
This code does work:
import pandas as pd
data = {'Codes': ['A123,B456,C678', 'X359,C678,F23', 'J3,D24,J36,K994']}
df = pd.DataFrame(data)
codes_list = df['Codes'].str.split(',').tolist()
codes_list looks like:
[['A123', 'B456', 'C678'], ['X359', 'C678', 'F23'], ['J3', 'D24', 'J36', 'K994']]
Note that this solution is idiomatic Pandas, whereas explicit loops should be avoided whenever possible.

import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(3, 2), columns=list('AB'))
print(df.head())
print(df.values.tolist())
output:
[[-0.2645782053241853, 0.5022937587041725], [1.624868960959602, 0.5086915380333786], [1.3593608874498997, 0.7077939622903995]]

Just remove the list from line CodeList.append(list(i))
CodeList = []
for i in df['Codes']:
CodeList.append(i.split(','))

Related

How to format strings differently in pandas python?

I have some base.
df = pd.DataFrame([
[time.strftime("%Y-%m-%d", time.gmtime(1611161411.46177)),405.52,39,46,633],
[time.strftime("%Y-%m-%d", time.gmtime(1611161911.46177)),406.52,41,103,582],
[time.strftime("%Y-%m-%d", time.gmtime(1611161911.46177)),406.52,41,146,544],
[time.strftime("%Y-%m-%d", time.gmtime(1611161911.46177)),406.52,41,164,532]], columns=['Date','Balance',"In sell","Quantity","Profit"])
this is what it looks like :
I want to apply to each row:
df = df.style.bar()
This is how I would like to see my final table:
only with formatting of all rows. I ask for your help in this matter.
try:
df.style.bar(subset=pd.IndexSlice[1:2, ['Quantity', 'Profit']], align='mid', color=['#5fba7d'])

How to form a matrix of distances between sites in Python?

I have all the data (sites and distances already).
Now I have to form a string matrix to use as an input for another python script.
I have sites and distances as (returned from a query, delimited as here):
A|B|5
A|C|3
A|D|9
B|C|7
B|D|2
C|D|6
How to create this kind of matrix?
A|B|C|D
A|0|5|3|9
B|5|0|7|2
C|3|7|0|6
D|9|2|6|0
This has to be returned as a string from python and I'll have more than 1000 sites, so it should be optimized for such size.
Thanks
I have no doubt it could be done in a cleaner way (because Python).
I will do some more research later on but I do want you to have something to start with, so here it is.
import pandas as pd
data = [
('A','B',5)
,('A','C',3)
,('A','D',9)
,('B','C',7)
,('B','D',2)
,('C','D',6)
]
data.extend([(y,x,val) for x,y,val in data])
df = pd.DataFrame(data, columns=['x','y','val'])
df = df.pivot_table(values='val', index='x', columns='y')
df = df.fillna(0)
Here is a demo for 1000x1000 (take about 2 seconds)
import pandas as pd, itertools as it
data = [(x,y,val) for val,(x,y) in enumerate(it.combinations(range(1000),2))]
data.extend([(y,x,val) for x,y,val in data])
df = pd.DataFrame(data, columns=['x','y','val'])
df = df.pivot_table(values='val', index='x', columns='y')
df = df.fillna(0)

Count occurrences of number from specific column in python

I am trying to do the equivalent of a COUNTIF() function in excel. I am stuck at how to tell the .count() function to read from a specific column in excel.
I have
df = pd.read_csv('testdata.csv')
df.count('1')
but this does not work, and even if it did it is not specific enough.
I am thinking I may have to use read_csv to read specific columns individually.
Example:
Column name
4
4
3
2
4
1
the function would output that there is one '1' and I could run it again and find out that there are three '4' answers. etc.
I got it to work! Thank you
I used:
print (df.col.value_counts().loc['x']
Here is an example of a simple 'countif' recipe you could try:
import pandas as pd
def countif(rng, criteria):
return rng.eq(criteria).sum()
Example use
df = pd.DataFrame({'column1': [4,4,3,2,4,1],
'column2': [1,2,3,4,5,6]})
countif(df['column1'], 1)
If all else fails, why not try something like this?
import numpy as np
import pandas
import matplotlib.pyplot as plt
df = pandas.DataFrame(data=np.random.randint(0, 100, size=100), columns=["col1"])
counters = {}
for i in range(len(df)):
if df.iloc[i]["col1"] in counters:
counters[df.iloc[i]["col1"]] += 1
else:
counters[df.iloc[i]["col1"]] = 1
print(counters)
plt.bar(counters.keys(), counters.values())
plt.show()

What is the best way to convert a string in a pandas dataframe to a list?

Basically I have a dataframe with lists that have been read in as strings and I would like to convert them back to lists.
Below shows what I am currently doing but I m still learning and feel like there must be a better (more efficient/Pythonic) way to go about this. Any help/constructive criticism would be much appreciated!
import pandas as pd
import ast
df = pd.DataFrame(data=['[-1,0]', '[1]', '[1,2]'], columns = ['example'])
type(df['example'][0])
>> str
n = df.shape[0]
temp = []
temp2 = []
for i in range(n):
temp = (ast.literal_eval(df['example'][i]))
temp2.append(temp)
df['new_col_lists'] = temp2
type(df['new_col_lists'][0])
>> list
Maybe you could use a map:
df['example'] = df['example'].map(ast.literal_eval)
With pandas, there is almost always a way to avoid the for loop.
You can use .apply
Ex:
import pandas as pd
import ast
df = pd.DataFrame(data=['[-1,0]', '[1]', '[1,2]'], columns = ['example'])
df['example'] = df['example'].apply(ast.literal_eval)
print( type(df['example'][0]) )
Output:
<type 'list'>
You could use apply with a lambda which splits and converts your strings:
df['new_col_lists'] = df['example'].apply(lambda s: [int(v.strip()) for v in s[1:-1].split(',')])
Use float cast instead of int if needed.

DataFrame filter data with string type use like

I have a dataframe like this
block_name
['循环经济']
['3D打印']
['再生经济']
Now I want get the data with block_name contains '经济' words.
The result that I want is:
block_name
['循环经济']
['再生经济']
And I tried this:
df = df[('经济' in df['block_name'])]
And this:
df = df[(df['block_name'].find('经济') != -1)]
But they don't work.
How should I do this result like the SQL's like "%经济%"?
Use .str.contains()
import pandas as pd
df = pd.DataFrame(['循环经济', '3D打印', '再生经济'], columns=['block_name'])
print df[df['block_name'].str.contains('经济')]

Categories