I have an existing excel. That looks like
and I have another excel that has around 40000 rows and around 300 columns. shortened version looks like
I would like to append values to my existing excel from second excel. But only values that match values in col4 from my existing excel. So i would get something like this
Hope you guys get the picture of what I am trying to do.
yes, that is possible in pandas and it is way faster than anything in excel
df_result = pd.merge(FirstTable, SecondTable, how='left', on='col4')
this will look into both the tables for column "col4" so it needs to be named this way in both the tables.
Also be aware of the fact that if you have multiple values in second table for single value in the first table it will make as many lines in the result as in the second table.
to read the excel you can use:
import pandas as pd
xl=pd.ExcelFile('MyFile.xlsx')
FirstTable = pd.read_excel(xl, 'sheet_name_FIRST_TABLE')
for more detailed description see documentation
Related
I am trying to read an excel file and write every fourth row into a new Excel file. I'm using Pandas to read and write, and if int(num%4) == 0 to determine which rows to select, but the iteration and subsequent writing continue to escape me. I've tried my best to look up answers, but I'm a new programmer and struggling :/
If you're using Pandas I'm assuming you've loaded the data into a dataframe?
If so then consider this:
import pandas as pd
df = pd.read_csv('YourFile.csv')
df.iloc[::4]
#once you're done with the data you can save it to another csv file
df.to_csv('OutputFile.csv')
This will leave your dataframe df with the 4th, 8th, 12th, etc. rows from your original dataframe/file. You can then read/write to each row left in the dataframe df. To visualize the before and after just insert df.head() before and after the df.iloc[::4] expression.
I did not understand what the problem is to be more specific, but you should try pandas' iloc property (or even loc depending on your df), check more info in here: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.iloc.html
I have a small dataframe that I would like to sort in Python. When sorting in Python, I get a different result then when sorting in Excel. I would like to sort from A to Z and have the pandas result match what is outputted in Excel.
Here is the code I used:
import pandas as pd
df = pd.DataFrame({"Col": ['0123A', '0123B', '01-AB']})
df = df.sort_values('Col', ascending=True)
Here is the output in python:
Col
2 01-AB
0 0123A
1 0123B
My output in excel is this:
Col
0 0123A
1 0123B
2 01-AB
Is there a reason why the pandas and excel result don't match?
Yes. There is a reason that the pandas sort and the Excel sort are different. The pandas sort is utilizing an ASCIII/UTF sorting hierarchy, which probably is what you expected; whereas, Excel treats the minus sign/hyphen differently in the sorting process. If you want the Excel spreadsheet to sort in the same manner as the pandas sort, you need to utilize some additional definitions and processing like what is explained if you connect to this link.
Excel Hyphen Sorting
You might need to read through the solution a few times for it to register, but sorting columns where cell contents can contain a hyphen requires a bit of extra work in Excel.
I hope that helps.
Regards.
im kinda new to pandas and stuck at how to refer a column within same name under different merged column. here some example which problem im stuck about. i wanna refer a database from worker at company C. but if im define this excel as df and
dfcompanyAworker=df[Worker]
it wont work
is there any specific way to define a database within identifical column like this ?
heres the table
https://i.stack.imgur.com/8Y6gp.png
thanks !
first read the dataset that will be used, then set the shape for example I use excel format
dfcompanyAworker = pd.read_excel('Worker', skiprows=1, header=[1,2], index_col=0, skipfooter=7)
dfcompanyAworker
where:
skiprows=1 to ignore the title row in the data
header=[1, 2] is a list because we have multilevel columns, namely Category (Company) and other data
index_col=0 to make the Date column an index for easier processing and analysis
skipfooter=7 to ignore the footer at the end of the data line
You can follow or try the steps as I made the following
I'm new to python and just trying to redo my first project from matlab. I've written a code in vscode to import an excel file using pandas
filename=r'C:\Users\user\Desktop\data.xlsx'
sheet=['data']
with pd.ExcelFile(filename) as xls:
Dateee=pd.read_excel(xls, sheet,index_col=0)
Then I want to access data in a row and column.
I tried to print data using code below:
for key in dateee.keys():
print(dateee.keys())
but this returns nothing.
Is there anyway to access the data (as a list)?
You can iterate on each column, making the contents of each a list:
for c in df:
print(df[c].to_list())
df is what the dataframe was assigned as. (OP had inconsistent syntax & so I didn't use that.)
Look into df.iterrows() or df.itertuples() if you want to iterate by row. Example:
for row in df.itertuples():
print(row)
Look into df.iloc and df.loc for row and column selection of individual values, see Pandas iloc and loc – quickly select rows and columns in DataFrames.
Or df.iat or df.at for getting or setting single values, see here, here, and here.
My excel has many columns and rows data. But I want to import specific columns and rows data.
My code:
L_pos_org = pd.read_excel('EXCELFILE.xlsx',sheet_name='Sheet1',na_values=['NA'],usecols = "M:U")
Above code extract the columns that I want but it also extracts all rows.
In above excel file, I am trying to extract the data of Columns M:U and rows 106:114.
How to extract this?
Looking at the documentation here, it seems that with a recent enough version of Pandas you could extract a specific block of rows using the parameters skiprows and nrows. I think the command would look something like
pd.read_excel('EXCELFILE.xlsx',sheet_name='Sheet1',header=None,na_values=['NA'],usecols="M:U",skiprows=range(105),nrows=9)