I have an excel file which includes 5 sheet. I should create 5 graphs and plot them as x and y. but I should loop it. How can i do
You can load all the sheets:
f = pd.ExcelFile('users.xlsx')
Then extract sheet names:
>>> f.sheet_names
['User_info', 'purchase', 'compound', 'header_row5']
Now, you can loop over the sheet names above. For example one sheet:
>>> f.parse(sheet_name = 'User_info')
User Name Country City Gender Age
0 Forrest Gump USA New York M 50
1 Mary Jane CANADA Tornoto F 30
2 Harry Porter UK London M 20
3 Jean Grey CHINA Shanghai F 30
The loop looks like this:
for name in f.sheet_names:
df = f.parse(sheet_name = name)
# do something here
No need to use variables, create the output lists and use this simple loop:
data = pd.ExcelFile("DCB_200_new.xlsx")
l = ['DCB_200_9', 'DCB_200_15', 'DCB_200_23', 'DCB_200_26', 'DCB_200_28']
x = []
y = []
for e in l:
x.append(pd.read_excel(data, e, usecols=[2], skiprows=[0,1]))
y.append(pd.read_excel(data, e, usecols=[1], skiprows=[0,1]))
But, ideally you should be able to load the data only once and loop over the sheets/columns. Please update your question with more info.
Related
I have Excel file that have 99 sheets and every sheet have 1 million columns and six rows, how can I search for a word using python to search for the row containing this word.
example:
0 Name Age Class
1 Michael Jackson 17 B
2 Barak Obama 38 A
3 Ariana Grande 82 C
I want a function I give it a name and it's give me the name and age and class like this:
Michael Jackson,17,B
My DataFrame
df= pandas.DataFrame({
"City" :["Chennai","Banglore","Mumbai","Delhi","Chennai","Banglore","Mumbai","Delhi"],
"Name" :["Praveen","Dhansekar","Naveen","Kumar","SelvaRani","Nithya","Suji","Konsy"]
"Gender":["M","M","M","M","F","F","F","F"]})
when printed it appears like this, df=
City
Name
Gender
Chennai
Praveen
M
Banglore
Dhansekar
M
Mumbai
Naveen
M
Delhi
Kumar
M
Chennai
SelvaRani
F
Banglore
Nithya
F
Mumbai
Suji
F
Delhi
Konsy
F
I want to save the data in separate DataFrame as follows:
Chennai=
City
Name
Gender
Chennai
Praveen
M
Chennai
SelvaRani
F
Banglore=
City
Name
Gender
Banglore
Dhansekar
M
Banglore
Nithya
F
Mumbai=
City
Name
Gender
Mumbai
Naveen
M
Mumbai
Suji
F
Delhi=
City
Name
Gender
Delhi
Kumar
M
Delhi
Konsy
F
My code is:
D_name= sorted(df['City'].unique())
for i in D_name:
f"{i}"=df[df['City']==I]
The dataset have more than 100 Cities.How do I write a for loop in python to get output as multiple data frame?
You can groupby and create a dictionary like so:
dict_dfs = dict(iter(df.groupby("City")))
Then you can directly access individual cities:
Delhi = dict_dfs["Delhi"]
print(Delhi)
# result:
City Name Gender
3 Delhi Kumar M
7 Delhi Konsy F
You could do something like this:
groups = df.groupby(by='City')
Bangalore = groups.get_group('Bangalore')
I have a column Name of string data type. I want to get all the values except the last one and put it in a new column FName, which I could achieve
df = pd.DataFrame({'Name': ['John A Sether', 'Robert D Junior', 'Jonny S Rab'],
'Age':[32, 34, 36]})
df['FName'] = df['Name'].str.split(' ').str[0:-1]
Name Age FName
0 John A Sether 32 [John, A]
1 Robert D Junior 34 [Robert, D]
2 Jonny S Rab 36 [Jonny, S]
But the new column FName looks like a list, which I don't want. I want it to be like: John A.
I tried convert the list to string, but it does not seems to be right.
Any suggestion ?
You can use .str.rsplit:
df['FName'] = df['Name'].str.rsplit(n=1).str[0]
Or you can use .str.extract:
df['FName'] = df['Name'].str.extract(r'(\S+\s?\S*)', expand=False)
Or, you can chain .str.join after .str.split:
df['FName'] = df['Name'].str.split().str[:-1].str.join(' ')
Name Age FName
0 John A Sether 32 John A
1 Robert D Junior 34 Robert D
2 Jonny S Rab 36 Jonny S
I'm new to pandas and so am a bit unfamiliar with how it works. I have processed some data and obtained the results I want, however, I am having trouble figuring out how to format the output with print. For instance, I only want to display certain rows of data, as well as putting certain values in ().
From doing this:
df = pd.read_csv('data_file.csv')
tallmen = df[df['gender'] == 'M'].nlargest(2, 'height')
This is the output I get by doing print(tallmen):
id name gender state height
6 5 Smith, Bob M New York 73.5
2 7 Wright, Frank M Kentucky 75.2
And this is the output I want:
Smith, Bob (M) 6' 1.5"
Wright, Frank (M) 6' 3.2"
When I tried to use tallmen as a dictionary, and that gave me an error. So I'm not quite sure what to do. Additionally, it there a way for me to be able to manipulate the height values so that I can reformat them (aka display them in the ft in format as shown below)?
you can create a new column this way:
In [207]: df
Out[207]:
id name gender state height
6 5 Smith, Bob M New York 73.5
2 7 Wright, Frank M Kentucky 75.2
In [208]: df['new'] = (
...: df.name + ' (' + df.gender + ') ' +
...: (df.height // 12).astype(int).astype(str) +
...: "' " + (df.height % 12).astype(str) + '"')
...:
In [209]: df
Out[209]:
id name gender state height new
6 5 Smith, Bob M New York 73.5 Smith, Bob (M) 6' 1.5"
2 7 Wright, Frank M Kentucky 75.2 Wright, Frank (M) 6' 3.2"
My professor helped me figure this out. Really what I needed was to know how to iterate through values in the DataFrame. My solution looks like this:
df = pd.read_csv('data_file.csv')
tallmen = df[df['gender'] == 'M'].nlargest(2, 'height')
for i, val in tallmen.iterrows():
feet = val['height']//12
inches = val['height']%12
print("%s (%s) %i'%i"" % (val['name'], val['gender'],
feet, inches))
I'm not deep involved with dictionaries in python. However, I have structured text data (ASCII) which I would like to convert to CSV (to input in a database or spreadsheet). Not all values are available in each line:
name Smith city Boston country USA
name Meier city Berlin ZIP 12345 country Germany
name Grigoriy country Russia
not all fields are in each line. However, no spaces are in the field values. How can I convert such textfile in a CSV like
name, city, ZIP, country
Smith, Boston, , USA
Meier, Berlin, 12345, Germany
Grigory, , , Russia
Try this:
d = """name Smith city Boston country USA
name Meier city Berlin ZIP 12345 country Germany
name Grigoriy country Russia"""
keys = {} # will collect all keys
objs = [] # will collect all lines
for line in d.split("\n"): # split input by linebreak
ks = [x for x in line.split()[::2]] # even positions: 0, 2, 4, 6
vs = [x for x in line.split()[1::2]] # odd positions: 1, 3, 5, 7
objs.append(dict(zip(ks, vs))) # turn line into dictionary
for key in ks:
keys[key] = True # note all keys
print(",".join(keys)) # print header row
for obj in objs:
print(",".join([obj.get(k, "") for k in keys]))
Output:
country,city,name,ZIP
USA,Boston,Smith,
Germany,Berlin,Meier,12345
Russia,,Grigoriy,
Getting the columns in another order is left as an exercise to the reader.