Currently I have built a network using NetworkX from source-target dataframe:
import networkx as nx
G = nx.from_pandas_edgelist(df, source='Person1', target='Person2')
Dataset
Person1 Age Person2 Wedding
0 Adam John 3 Yao Ming Green
1 Mary Abbey 5 Adam Lebron Green
2 Samuel Bradley 24 Mary Lane Orange
3 Lucas Barney 12 Julie Lime Yellow
4 Christopher Rice 0.9 Matt Red Green
I would like to set the size/weights of the links based on the Age column (i.e. age of marriage) and the colour of nodes as in the column Wedding.
I know that, if I wanted add an edge, I could set it as follows: G.add_edge(Person1,Person2, size = 10); for applying different colours to nodes I should probably use the parameter node_color=color_map, where color_map should be the list of colours in the Wedding column (if I am right).
Can you please explain me how to apply these settings to my case?
IIUC:
df = pd.read_clipboard(sep='\s\s+')
collist = df.drop('Age', axis=1).melt('Wedding')
collist
G = nx.from_pandas_edgelist(df, source='Person1', target='Person2', edge_attr='Age')
pos=nx.spring_layout(G)
nx.draw_networkx_nodes(G, pos, nodelist=collist['value'], node_color=collist['Wedding'])
nx.draw_networkx_edges(G, pos, width = [i['Age'] for i in dict(G.edges).values()])
Output:
Related
I want to display the information in the following example dataset as a directed graph with multiple edges between nodes.
Attached is an example of the kind of graph I expect, as well as my code, which does not produce the expected output.
Thanks,
G = nx.from_pandas_edgelist(df, 'source', 'destination', edge_attr='number of passengers', create_using=nx.DiGraph())
pos = nx.random_layout(G)
nx.draw(G,
pos=pos)
edge_labels = nx.get_edge_attributes(G, "Edge_label")
nx.draw(G, with_labels=True)
nx.draw_networkx_edge_labels(G, pos, edge_labels)
plt.show()
date
source
destination
number of passengers
2019-01-01
NY
BERLIN
10
2019-01-02
NY
PARIS
50
2019-01-03
NY
BERLIN
40
2019-01-04
BERLIN
PARIS
20
2019-01-05
NY
PARIS
15
2019-01-06
BERLIN
NY
17
Working in python with a dataframe, I'm trying to match certain rows and create a new column based on successful matching - e.g. if 'Breed' + 'Color' match, put 'Name' of matched row in 'Mate' column of the Male in the pair. For example, in the table below Adam/Eve and Antony/Cleopatra should be matched, resulting in Eve and Cleopatra being put in the 'Mate' column for Adam and Antony, respectively. Since Clyde and Beauty have different breeds, this does not occur.
Name
Breed
Color
Sex
Mate?
Adam
Boxer
White
Male
(Eve)
Eve
Boxer
White
Female
Antony
Lab
Chocolate
Male
(Cleopatra)
Cleopatra
Lab
Chocolate
Female
Clyde
Husky
Gray
Male
Beauty
Bulldog
Gray
Female
Thanks!!
from collections import defaultdict
# First collect the Potential names of mates
mates = defaultdict(list)
for row in df.itertuples():
props = (row.Breed, row.Color)
mates[props].append(row.Name)
# Secondly create Mate column
def return_mates(name, breed, color):
match = (breed, color)
return [m for m in mates[match] if m != name]
df.loc[:, 'Mate'] = df[['Name', 'Breed', 'Color']].apply(lambda x: return_mates(*x), axis=1)
One way:
df['Mate?'] = df.Name.map(dict(df.groupby(['Breed', 'Color'])['Name'].agg(list).apply(pd.Series).dropna().values)).fillna('')
OUTPUT:
Name Breed Color Sex Mate?
0 Adam Boxer White Male Eve
1 Eve Boxer White Female
2 Antony Lab Chocolate Male Cleopatra
3 Cleopatra Lab Chocolate Female
4 Clyde Husky Gray Male
5 Beauty Bulldog Gray Female
I want to assign some selected nominal values randomly to rows. For example:
I have three nominal values ["apple", "orange", "banana"].
Before assign these values randomly to rows:
**Name Fruit**
Jack
Julie
Juana
Jenny
Christina
Dickens
Robert
Cersei
After assign these values randomly to rows:
**Name Fruit**
Jack Apple
Julie Orange
Juana Apple
Jenny Banana
Christina Orange
Dickens Orange
Robert Apple
Cersei Banana
How can I do this using pandas dataframe?
You can use pd.np.random.choice with your values:
vals = ["apple", "orange", "banana"]
df['Fruit'] = pd.np.random.choice(vals, len(df))
>>> df
Name Fruit
0 Jack apple
1 Julie orange
2 Juana apple
3 Jenny orange
4 Christina apple
5 Dickens banana
6 Robert orange
7 Cersei orange
You can create a DataFrame in pandas and then assign random choices using numpy
ex2 = pd.DataFrame({'Name':['Jack','Julie','Juana','Jenny','Christina','Dickens','Robert','Cersei']})
ex2['Fruits'] = np.random.choice(['Apple','Orange','Banana'],ex2.shape[0])
I have two series' which contains the same data, but they contain a different number of occurrences of this data. I want to compare these two series' by making a bar chart, where the two are compared. Below is what I've done so far.
import matplotlib.patches as mpatches
fig = plt.figure()
ax = fig.add_subplot(111)
width = 0.3
tree_amount15.plot(kind='bar', color='red', ax=ax, width=width, position=1, label='NYC')
queens_tree_types.plot(kind='bar', color='blue', ax=ax, width=width, position=0, label='Queens')
plt.legend(bbox_to_anchor=(0., 1.02, 1., .102), loc=3,
ncol=2, mode="expand", borderaxespad=0.)
ax.set_ylabel('Total trees')
ax.set_xlabel('Tree names')
plt.show()
Which gives me the following chart:
The problem I have is that, even though all the 'Tree names' are the same in each series, the 'Total trees' is of course different, so for example, #5 (Callery pear) is only #5 in 'tree_amount15', where it's #3 in 'queens_tree_types' and so on. How can I order the series such that it's the value that corresponds to the right label shown on the chart? Because right now, it's the labels from the series that gets added first, which is shown, which makes the values of the second series be misleading.
Any hints?
Here's how the two series look, when I do a value_counts() them.
tree_amount15:
London planetree 87014
honeylocust 64264
Callery pear 58931
pin oak 53185
Norway maple 34189
littleleaf linden 29742
cherry 29279
Japanese zelkova 29258
ginkgo 21024
Sophora 19338
red maple 17246
green ash 16251
American linden 13530
silver maple 12277
sweetgum 10657
northern red oak 8400
silver linden 7995
American elm 7975
maple 7080
purple-leaf plum 6879
queens_tree_types:
London planetree 31111
pin oak 22610
honeylocust 20290
Norway maple 19407
Callery pear 16547
cherry 13497
littleleaf linden 11902
Japanese zelkova 8987
green ash 7389
silver maple 6116
ginkgo 5971
Sophora 5386
red maple 4935
American linden 4769
silver linden 4146
purple-leaf plum 3035
maple 2992
northern red oak 2697
sweetgum 2489
American elm 1709
You can create a data frame from your two series that uses the tree name index. By default pandas will sort the index alphabetically, so we tell it to sort using the values of NYC. With both series as columns, we can use a single call to the plot method to put them on the same graph.
df = pd.concat([tree_amount15, queens_tree_types], axis=1).rename_axis(
{0:'NYC', 1:'Queens'}, axis='columns') # sets the column names
df.sort_values('NYC', ascending=False) # sort the df using NYC values
df.plot.bar(color=['red','blue'])
This question already has answers here:
Transform dataframe by grouping row
(3 answers)
Closed 6 years ago.
I have a transaction data set and I want to transform this according to the customer ID. the sample is given below.
CustomerID Description
17850 WHITE HANGING HEART T-LIGHT HOLDER
17850 WHITE METAL LANTERN
13047 ASSORTED COLOUR BIRD ORNAMENT
13047 POPPY'S PLAYHOUSE BEDROOM
13047 POPPY'S PLAYHOUSE KITCHEN
I want this data set in the following order:-
17850 WHITE HANGING HEART T-LIGHT HOLDER, WHITE METAL LANTERN
13047 ASSORTED COLOUR BIRD ORNAMENT,POPPY'S PLAYHOUSE BEDROOM, POPPY'S PLAYHOUSE KITCHEN
The dataset is in csv format and each value in separate cell.
can anyone suggest any method to do this in excel or R or python?
You can use aggregate() function, Created my own data, you can do this for your own data frame above. Based on Customer number, the Texts are concatenated
> df <- data.frame(Customer = c(1,1,2,3,3,4), Texts = c("AAA","aaa","BBB","bbb","CCC","ccc"))
> df
Customer Texts
1 1 AAA
2 1 aaa
3 2 BBB
4 3 bbb
5 3 CCC
6 4 ccc
> aggregate(Texts~Customer,toString,data=df)
Customer Texts
1 1 AAA, aaa
2 2 BBB
3 3 bbb, CCC
4 4 ccc
In Python, you can use pandas.
Install it, then try
import pandas as pd
# Read the cvs file
df = pd.read_csv('yourFileName.csv')
# Group by CustomerID and join Descriptions with commas
df.groupby('CustomerID')['Description'].apply(','.join)
# Save the result in cvs file
df.to_csv('resultFileName.csv', index=False)
other ways to do this include using plyr and data.table. data.table is probably more efficient, simple and as well offers control.
library(plyr)
ddply(df, .(ID), summarize, Text = paste(Text, collapse = ","))
or
require(DT)
DT <- data.table(df)
# group the table by ID and then add a new column by pasting the list
# of values in each group together.
DT[, list(Text = paste(Text, collapse = ",")), by = ID]
ID Text
1: 17850 WHITE HANGING HEART T-LIGHT HOLDER,WHITE METAL LANTERN
2: 13047 ASSORTED COLOUR BIRD ORNAMENT,POPPY'S PLAYHOUSE BEDROOM, POPPY'S PLAYHOUSE KITCHEN
Data
df <- data.frame(ID = c(17850,17850,13047,13047,13047),
Text = c("WHITE HANGING HEART T-LIGHT HOLDER","WHITE METAL LANTERN",
" ASSORTED COLOUR BIRD ORNAMENT","POPPY'S PLAYHOUSE BEDROOM",
" POPPY'S PLAYHOUSE KITCHEN"))