I have Excel file that have 99 sheets and every sheet have 1 million columns and six rows, how can I search for a word using python to search for the row containing this word.
example:
0 Name Age Class
1 Michael Jackson 17 B
2 Barak Obama 38 A
3 Ariana Grande 82 C
I want a function I give it a name and it's give me the name and age and class like this:
Michael Jackson,17,B
Related
I have an excel file which includes 5 sheet. I should create 5 graphs and plot them as x and y. but I should loop it. How can i do
You can load all the sheets:
f = pd.ExcelFile('users.xlsx')
Then extract sheet names:
>>> f.sheet_names
['User_info', 'purchase', 'compound', 'header_row5']
Now, you can loop over the sheet names above. For example one sheet:
>>> f.parse(sheet_name = 'User_info')
User Name Country City Gender Age
0 Forrest Gump USA New York M 50
1 Mary Jane CANADA Tornoto F 30
2 Harry Porter UK London M 20
3 Jean Grey CHINA Shanghai F 30
The loop looks like this:
for name in f.sheet_names:
df = f.parse(sheet_name = name)
# do something here
No need to use variables, create the output lists and use this simple loop:
data = pd.ExcelFile("DCB_200_new.xlsx")
l = ['DCB_200_9', 'DCB_200_15', 'DCB_200_23', 'DCB_200_26', 'DCB_200_28']
x = []
y = []
for e in l:
x.append(pd.read_excel(data, e, usecols=[2], skiprows=[0,1]))
y.append(pd.read_excel(data, e, usecols=[1], skiprows=[0,1]))
But, ideally you should be able to load the data only once and loop over the sheets/columns. Please update your question with more info.
I have a large df called data which looks like:
Identifier Surname First names(s) Date change Work Pattern Region
0 12233.0 Smith Bob FT NW
1 54213.0 Jones Sally 15/04/15 FT NW
2 12237.0 Evans Steve 26/08/14 FT SE
3 10610.0 Cooper Amy 16/08/12 FT SE
I have another dataframe called updates. In this example the dataframe has updated information for data for a couple of records and looks like:
Identifier Surname First names(s) Date change
0 12233.0 Smith Bob 05/09/14
1 10610.0 Cooper Amy 16/08/12
I'm trying to find a way to update data with the updates df so the resulting dataframe looks like:
Identifier Surname First names(s) Date change Work Pattern Region
0 12233.0 Smith Bob 15/09/14 FT NW
1 54213.0 Jones Sally 15/04/15 FT NW
2 12237.0 Evans Steve 26/08/14 FT SE
3 10610.0 Cooper Amy 16/08/12 FT SE
As you can see the Date change field for Bob in the data df has been updated with the Date change from the updates df.
What can I try next?
a while back, I was dealing with that too. the straight up .update was giving me issues (sorry can't remember the exact issue I had. I think it was that when you do .update, it's reliant on indexes matching, and they didn't match in my 2 separate dataframes. so I wanted to use certain columns as my index to update on),
But I made a function to deal with it. So this might be way overkill than what's needed but try this and see if it'll work.
I'm also assuming the date you want update from the updates dataframe should be 15/09/14 not 05/09/14. So I had that different in my sample data below
Also, I'm assuming the Identifier is unique key. If not, you'll need to include multiple columns as your unique key
import sys
import pandas as pd
data = pd.DataFrame([[12233.0,'Smith','Bob','','FT','NW'],
[54213.0,'Jones','Sally','15/04/15','FT','NW'],
[12237.0,'Evans','Steve','26/08/14','FT','SE'],
[10610.0,'Cooper','Amy','16/08/12','FT','SE']],
columns = ['Identifier','Surname','First names(s)','Date change','Work Pattern','Region'])
updates = pd.DataFrame([[12233.0,'Smith','Bob','15/09/14'],
[10610.0,'Cooper','Amy','16/08/12']],
columns = ['Identifier','Surname','First names(s)','Date change'])
def update(df1, df2, keys_list):
df1 = df1.set_index(keys_list)
df2 = df2.set_index(keys_list)
dup_idx1 = df1.index.get_duplicates()
dup_idx2 = df2.index.get_duplicates()
if len(dup_idx1) > 0 or len(dup_idx2) > 0:
print('\n'+'#'*50+'\nError! Duplicate Indicies:')
for element in dup_idx1:
print('df1: %s' %(element,))
for element in dup_idx2:
print('df2: %s' %(element,))
print('#'*50+'\n\n')
df1.update(df2, overwrite=True)
df1.reset_index(inplace=True)
df2.reset_index(inplace=True)
return df1
# the 3rd input is a list, in case you need multiple columns as your unique key
df = update(data, updates, ['Identifier'])
Output:
print (data)
Identifier Surname First names(s) Date change Work Pattern Region
0 12233.0 Smith Bob FT NW
1 54213.0 Jones Sally 15/04/15 FT NW
2 12237.0 Evans Steve 26/08/14 FT SE
3 10610.0 Cooper Amy 16/08/12 FT SE
print (updates)
Identifier Surname First names(s) Date change
0 12233.0 Smith Bob 15/09/14
1 10610.0 Cooper Amy 16/08/12
df = update(data, updates, ['Identifier'])
In [19]: print (df)
Identifier Surname First names(s) Date change Work Pattern Region
0 12233.0 Smith Bob 15/09/14 FT NW
1 54213.0 Jones Sally 15/04/15 FT NW
2 12237.0 Evans Steve 26/08/14 FT SE
3 10610.0 Cooper Amy 16/08/12 FT SE
Using DataFrame.update.
First set index:
data.set_index('Identifier', inplace=True)
updates.set_index('Identifier', inplace=True)
Then update:
data.update(updates)
print(data)
Surname First names(s) Date change Work Pattern Region
Identifier
12233.0 Smith Bob 15/09/14 FT NW
54213.0 Jones Sally 15/04/15 FT NW
12237.0 Evans Steve 26/08/14 FT SE
10610.0 Cooper Amy 16/08/12 FT SE
If you need multiple columns to create a unique index you can just set them with a list. For example:
data.set_index(['Identifier', 'Surname'], inplace=True)
updates.set_index(['Identifier', 'Surname'], inplace=True)
data.update(updates)
So i trying to use pandas in python bot from a excel sheet. see below table is a sample of excel with 4 columns and 10000+ rows with multiple sheets.
Name Marks Rank School
Student1 655 1 Cambridge
Student2 345 2 Cambridge
Student3 554 3 Cambridge
Student4 847 4 St Peter
Student5 343 5 Cambridge
Student6 546 6 St Peter
Student7 755 7 St Peter
Student8 465 8 St Peter
Student9 467 9 Cambridge
So i tried all pandas examples found in google search. but everything shows results in console or bash file.
import xlrd
df = pd.read_excel('results.xlsx', sheet_name='Sheet1')
print(df[df["Name"] == "Student5"])
So how to get info about a particular student in discord channel.
example student5
Name Marks Rank School
Student5 343 5 Cambridge
I am not familiar with discord but you can get the result as a numpy array:
myarray = df[df["Name"] == "Student5"].values
I wanted to parse a given file line by line. The file has a format of
'name age gender hobby1 hobby2...'.
The first thing that came to mind was to use a named tuple of the form namedtuple('info',['name','age', 'gender','hobby']).
How can I save the data in my file to a list of tuples with the corresponding value. I tried using line.split() but I couldn't figure out how I can save the space separated hobbies to info.hobby.
Input file
If I understand you correctly, you can use pandas and pass 'this_is_a_space' as the sep if data is like this:
name age gender hobby1 hobby2
steve 12 male xyz abc
bob 29 male swimming golfing
alice 40 female reading cooking
tom 50 male sleeping
and here is syntax for method described above:
import pandas as pd
df = pd.read_csv('file.txt', sep=' ')
df.fillna(' ', inplace=True)
df['hobby'] = df[['hobby1', 'hobby2']].apply(lambda i: ' '.join(i), axis=1)
df.drop(['hobby1', 'hobby2'], axis=1, inplace=True)
print df
out:
name age gender hobby
0 steve 12 male xyz abc
1 bob 29 male swimming golfing
2 alice 40 female reading cooking
3 tom 50 male sleeping
EDIT: added your data from comment above
I am new to python so please excuse me for my question. In my line of work I have to work with tabular data represented in text files. The values are separated by either a coma or semi colon. The simplified example of such file might look as following:
City;Car model;Color;Registration number
Moscow;Mercedes;Red;1234
Moscow;Mercedes;Red;2345
Kiev;Toyota;Blue;3423
London;Fiat;Red;4545
My goal is to have a script which can tell me how many Mercedes are in Moscow (in our case there are two) and save a new text file Moscow.txt with following
Moscow;Mercedes;Red;1234
Moscow;Mercedes;Red;2345
I will be very thankful for your help.
I would recommend looking into the pandas library. You can do all sorts of neat manipulations of tabular data. First read it in:
>>> import pandas as pd
>>> df = pd.read_csv("cars.ssv", sep=";")
>>> df
City Car model Color Registration number
0 Moscow Mercedes Red 1234
1 Moscow Mercedes Red 2345
2 Kiev Toyota Blue 3423
3 London Fiat Red 4545
Index it in different ways:
>>> moscmerc = df[(df["City"] == "Moscow") & (df["Car model"] == "Mercedes")]
>>> moscmerc
City Car model Color Registration number
0 Moscow Mercedes Red 1234
1 Moscow Mercedes Red 2345
>>> len(moscmerc)
2
Write it out:
>>> moscmerc.to_csv("moscmerc.ssv", sep=";", header=None, index=None)
>>> !cat moscmerc.ssv
Moscow;Mercedes;Red;1234
Moscow;Mercedes;Red;2345
You can also work on multiple groups at once:
>>> df.groupby(["City", "Car model"]).size()
City Car model
Kiev Toyota 1
London Fiat 1
Moscow Mercedes 2
Dtype: int64
Update: #Anthon pointed out that the above only handles the case of a semicolon separator. If a file has a comma throughout, then you can just use , instead of ;, so that's trivial. The more interesting case is if the delimiter is inconsistent within the file, but that's easily handled too:
>>> !cat cars_with_both.txt
City;Car model,Color;Registration number
Moscow,Mercedes;Red;1234
Moscow;Mercedes;Red;2345
Kiev,Toyota;Blue,3423
London;Fiat,Red;4545
>>> df = pd.read_csv("cars_with_both.txt", sep="[;,]")
>>> df
City Car model Color Registration number
0 Moscow Mercedes Red 1234
1 Moscow Mercedes Red 2345
2 Kiev Toyota Blue 3423
3 London Fiat Red 4545
Update #2: and now the text is in Russian -- of course it is. :^) Still, if everything is correctly encoded, and your terminal is properly configured, that should work too:
>>> df = pd.read_csv("russian_cars.csv", sep="[;,]")
>>> df
City Car model Color Registration number
0 Москва Mercedes красный 1234
1 Москва Mercedes красный 2345
2 Киев Toyota синий 3423
3 Лондон Fiat красный 4545