Hello this is my csv data
Age Name
0 22 George
1 33 lucas
2 22 Nick
3 12 Leo
4 32 Adriano
5 53 Bram
6 11 David
7 32 Andrei
8 22 Sergio
i want to use if else statement , for example if George is adult create new row and insert +
i mean
Age Name Adul
22 George +
What is best way?
This is my code Which i am using to read data from csv
import pandas as pd
produtos = pd.read_csv('User.csv', nrows=9)
print(produtos)
for i, produto in produtos.iterrows():
print(i,produto['Age'],produto['Name'])
IIUC, you want to create a new column (not row) call "Adul". You can do this with numpy.where:
import numpy as np
produtos["Adul"] = np.where(produtos["Age"].ge(18), "+", np.nan)
Edit:
To only do this for a specific name, you could use:
name = input("Name")
if name in produtos["Name"].tolist():
if produtos.loc[produtos["Name"]==name, "Age"] >= 18:
produtos.loc[produtos["Name"]==name, "Adul"] = "+"
You can do this:
produtos["Adul"] = np.where(produtos["Age"] >= 18, "+", np.nan)
Related
I have a .txt file that has the data regarding the total number of queries with valid names. The text inside of the file came out of a SQL Server 19 query output. The database used consists of the results of an algorithm that retrieves the most similar brands related to the query inserted. The file looks something like this:
2 16, 42, 44 A MINHA SAÚDE
3 34 !D D DUNHILL
4 33 #MEGA
5 09 (michelin man)
5 12 (michelin man)
6 33 *MONTE DA PEDRA*
7 35 .FOX
8 33 #BATISTA'S BY PITADA VERDE
9 12 #COM
10 41 + NATUREZA HUMANA
11 12 001
12 12 002
13 12 1007
14 12 101
15 12 102
16 12 104
17 37 112 PC
18 33 1128
19 41 123 PILATES
The 1st column has the Query identifier, the 2nd one has the brand classes where the Query can be located and the 3rd one is the Query itself (the spaces came from the SQL Server output formatting).
I then made a Pandas DataFrame in Google Colaboratory where I wanted the columns to be like the ones in the text file. However, when I ran the code, it gave me this:
The code that I wrote is here:
# Dataframe with the total number of queries with valid names:
df = pd.DataFrame(pd.read_table("/content/drive/MyDrive/data/classes/100/queries100.txt", header=None, names=["Query ID", "Query Name", "Classes Where Query is Present"]))
df
I think that this happens because of the commas in the 2nd column but I'm not quite sure. Any suggestions on why this is happening? I already tried read_csv and read_fwf and they were even worse in terms of formatting.
You can use pd.read_fwf() in this case, as your columns have fixed widths:
import pandas as pd
df = pd.read_fwf(
"/content/drive/MyDrive/data/classes/100/queries100.txt",
colspecs=[(0,20),(21,40),(40,1000)],
header=None,
names=["Query ID", "Query Name", "Classes Where Query is Present"]
)
df.head()
# Query ID Query Name Classes Where Query is Present
# 0 2 16, 42, 44 A MINHA SAÚDE
# 1 3 34 !D D DUNHILL
# 2 4 33 #MEGA
# 3 5 09 (michelin man)
# 4 5 12 (michelin man)
I have a dataframe of people with Age as a column. I would like to match this age to a group, i.e. Baby=0-2 years old, Child=3-12 years old, Young=13-18 years old, Young Adult=19-30 years old, Adult=31-50 years old, Senior Adult=51-65 years old.
I created the lists that define these year groups, e.g. Adult=list(range(31,51)) etc.
How do I match the name of the list 'Adult' to the dataframe by creating a new column?
Small input: the dataframe is made up of three columns: df['Name'], df['Country'], df['Age'].
Name Country Age
Anthony France 15
Albert Belgium 54
.
.
.
Zahra Tunisia 14
So I need to match the age column with lists that I already have. The output should look like:
Name Country Age Group
Anthony France 15 Young
Albert Belgium 54 Adult
.
.
.
Zahra Tunisia 14 Young
Thanks!
IIUC I would go with np.select:
import pandas as pd
import numpy as np
df = pd.DataFrame({'Age': [3, 20, 40]})
condlist = [df.Age.between(0,2),
df.Age.between(3,12),
df.Age.between(13,18),
df.Age.between(19,30),
df.Age.between(31,50),
df.Age.between(51,65)]
choicelist = ['Baby', 'Child', 'Young',
'Young Adult', 'Adult', 'Senior Adult']
df['Adult'] = np.select(condlist, choicelist)
Output:
Age Adult
0 3 Child
1 20 Young Adult
2 40 Adult
Here's a way to do that using pd.cut:
df = pd.DataFrame({"person_id": range(25), "age": np.random.randint(0, 100, 25)})
print(df.head(10))
==>
person_id age
0 0 30
1 1 42
2 2 78
3 3 2
4 4 44
5 5 43
6 6 92
7 7 3
8 8 13
9 9 76
df["group"] = pd.cut(df.age, [0, 18, 50, 100], labels=["child", "adult", "senior"])
print(df.head(10))
==>
person_id age group
0 0 30 adult
1 1 42 adult
2 2 78 senior
3 3 2 child
4 4 44 adult
5 5 43 adult
6 6 92 senior
7 7 3 child
8 8 13 child
9 9 76 senior
Per your question, if you have a few lists (like the ones below), and would like to convert use them for 'binning', you can do:
# for example, these are the lists
Adult = list(range(18,50))
Child = list(range(0, 18))
Senior = list(range(50, 100))
# Creating bins out of the lists.
bins = [min(l) for l in [Child, Adult, Senior]]
bins.append(max([max(l) for l in [Child, Adult, Senior]]))
labels = ["Child", "Adult", "Senior"]
# using the bins:
df["group"] = pd.cut(df.age, bins, labels=labels)
To make things more clear for beginners, you can define a function that will return the age group of each person accordingly, then use pandas.apply() to apply that function to our 'Group' column:
import pandas as pd
def age(row):
a = row['Age']
if 0 < a <= 2:
return 'Baby'
elif 2 < a <= 12:
return 'Child'
elif 12 < a <= 18:
return 'Young'
elif 18 < a <= 30:
return 'Young Adult'
elif 30 < a <= 50:
return 'Adult'
elif 50 < a <= 65:
return 'Senior Adult'
df = pd.DataFrame({'Name':['Anthony','Albert','Zahra'],
'Country':['France','Belgium','Tunisia'],
'Age':[15,54,14]})
df['Group'] = df.apply(age, axis=1)
print(df)
Output:
Name Country Age Group
0 Anthony France 15 Young
1 Albert Belgium 54 Senior Adult
2 Zahra Tunisia 14 Young
I would need to create a new column with data extracted from another column.
Name Surname Age
Nivea Jones 45
Kelly Pams 68
Matthew Currigan 24
...
I would like to create a new column with only the first letter from the name and surname, i.e.
Name Surname Age Short FN
Nivea Jones 45 NJ
Kelly Pams 68 KP
Matthew Currigan 24 MC
...
I did as follows:
df['Short FN'] = df['Name'].str.get(0) +df['Surname'].str.get(0)
and it works well. However, I would need to build a function, with two columns (in this case, name and surname) as parameters:
def sh(x,y):
df['Short FN'] = df[x].str.get(0) +df[y].str.get(0)
return
and it does not work, probably because I should keep in mind that I am using columns from a dataframe as parameter. Also, I do not know if and what I should return.
Could you please explain me how to create a function where I check/pass columns and how to use this function (not clear to me if I need to iterate through rows using a for loop)?
You can do this:
def sh(x, y):
return x[0] + y[0]
df['Short'] = df.apply(lambda x: sh(x['Name'], x['Surname']), axis=1)
print(df)
Name Surname Age Short
0 Nivea Jones 45 NJ
1 Kelly Pams 68 KP
2 Matthew Currigan 24 MC
There are several ways to do that. The simplest way, assuming df is global (as it seems to be in your case), is:
def short_name(col1, col2):
return df[col1].str[0] + df[col2].str[0]
calling short_name("Name", "Surname")
produces:
0 NJ
1 KP
2 MC
dtype: object
You can now use it in whatever way you want. For example:
df["sn"] = short_name("Name", "Surname")
print(df)
# produces:
Name Surname Age sn
0 Nivea Jones 45 NJ
1 Kelly Pams 68 KP
2 Matthew Currigan 24 MC
I have sample schema, which consists 12 columns, and each column has certain category. Now i need to simulate those data into a dataframe of around 1000 rows. How do i go about it?
I have used below code to generate data for each column
Location = ['USA','India','Prague','Berlin','Dubai','Indonesia','Vienna']
Location = random.choice(Location)
Age = ['Under 18','Between 18 and 64','65 and older']
Age = random.choice(Age)
Gender = ['Female','Male','Other']
Gender = random.choice(Gender)
and so on
I need the output as below
Location Age Gender
Dubai below 18 Female
India 65 and older Male
.
.
.
.
You can create each column one by one using np.random.choice:
df = pd.DataFrame()
N = 1000
df["Location"] = np.random.choice(Location, size=N)
df["Age"] = np.random.choice(Age, size=N)
df["Gender"] = np.random.choice(Gender, size=N)
Or do that using a list comprehension:
column_to_choice = {"Location": Location, "Age": Age, "Gender": Gender}
df = pd.DataFrame(
[np.random.choice(column_to_choice[c], 100) for c in column_to_choice]
).T
df.columns = list(column_to_choice.keys())
Result:
>>> print(df.head())
Location Age Gender
0 India 65 and older Female
1 Berlin Between 18 and 64 Female
2 USA Between 18 and 64 Male
3 Indonesia Under 18 Male
4 Dubai Under 18 Other
You can create a for loop for the number of rows you want in your dataframe and then generate a list of dictionary. Use the list of dictionary to generate the dataframe.
In [16]: for i in range(5):
...: k={}
...: loc = random.choice(Location)
...: age = random.choice(Age)
...: gen = random.choice(Gender)
...: k = {'Location':loc,'Age':age, 'Gender':gen}
...: list2.append(k)
...:
In [17]: import pandas as pd
In [18]: df = pd.DataFrame(list2)
In [19]: df
Out[19]:
Age Gender Location
0 Between 18 and 64 Other Berlin
1 65 and older Other USA
2 65 and older Male Dubai
3 Between 18 and 64 Male Dubai
4 Between 18 and 64 Male Indonesia
I have written below function in python:
def proc_summ(df,var_names_in,var_names_group):
df['Freq']=1
df_summed=pd.pivot_table(df,index=(var_names_group),
values=(var_names_in),
aggfunc=[np.sum],fill_value=0,margins=True,margins_name='Total').reset_index()
df_summed.columns = df_summed.columns.map(''.join)
df_summed.columns = [x.strip().replace('sum', '') for x in df_summed.columns]
string_repr = df_summed.to_string(index=False,justify='center').splitlines()
string_repr.insert(1, "-" * len(string_repr[0]))
string_repr.insert(len(df_summed.index)+1, "-" * len(string_repr[0]))
out = '\n'.join(string_repr)
print(out)
And below is the code I am using to call the function:
proc_summ (
df,
var_names_in=["Freq","sal"] ,
var_names_group=["name","age"])
and below is the output:
name age Freq sal
--------------------
Arik 32 1 100
David 44 2 260
John 33 1 200
John 34 1 300
Peter 33 1 100
--------------------
Total 6 960
Please let me know how can I print the data to the center of the screen like :
name age Freq sal
--------------------
Arik 32 1 100
David 44 2 260
John 33 1 200
John 34 1 300
Peter 33 1 100
--------------------
Total 6 960
If you are using Python3 you can try something like this
import shutil
columns = shutil.get_terminal_size().columns
print("hello world".center(columns))
As You are Using DataFrame you can try something like this
import shutil
import pandas as pd
data = {'col1': [1, 2], 'col2': [3, 4]}
df = pd.DataFrame(data)
# convert DataFrame to string
df_string = df.to_string()
df_split = df_string.split('\n')
columns = shutil.get_terminal_size().columns
for i in range(len(df)):
print(df_split[i].center(columns))