How to import specific columns and rows from Excel using Pandas - python

My excel has many columns and rows data. But I want to import specific columns and rows data.
My code:
L_pos_org = pd.read_excel('EXCELFILE.xlsx',sheet_name='Sheet1',na_values=['NA'],usecols = "M:U")
Above code extract the columns that I want but it also extracts all rows.
In above excel file, I am trying to extract the data of Columns M:U and rows 106:114.
How to extract this?

Looking at the documentation here, it seems that with a recent enough version of Pandas you could extract a specific block of rows using the parameters skiprows and nrows. I think the command would look something like
pd.read_excel('EXCELFILE.xlsx',sheet_name='Sheet1',header=None,na_values=['NA'],usecols="M:U",skiprows=range(105),nrows=9)

Related

Python Iteratively Read and Write Rows

I am trying to read an excel file and write every fourth row into a new Excel file. I'm using Pandas to read and write, and if int(num%4) == 0 to determine which rows to select, but the iteration and subsequent writing continue to escape me. I've tried my best to look up answers, but I'm a new programmer and struggling :/
If you're using Pandas I'm assuming you've loaded the data into a dataframe?
If so then consider this:
import pandas as pd
df = pd.read_csv('YourFile.csv')
df.iloc[::4]
#once you're done with the data you can save it to another csv file
df.to_csv('OutputFile.csv')
This will leave your dataframe df with the 4th, 8th, 12th, etc. rows from your original dataframe/file. You can then read/write to each row left in the dataframe df. To visualize the before and after just insert df.head() before and after the df.iloc[::4] expression.
I did not understand what the problem is to be more specific, but you should try pandas' iloc property (or even loc depending on your df), check more info in here: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.iloc.html

Stack multiple columns into one single column in a csv with Python 3.x

I have a large CSV file with multiple columns that I would like to merge into 2 columns using Python.
What I have:
ID.12345 ID.45678
CVE-xxxx-1234 CVE-xxxx-5678
CVE-xxxx-3456
What I need:
ID CVE
ID.12345 CVE-xxxx-1234
ID.12345 CVE-xxxx-3456
ID.45678 CVE-xxxx-5678
I looked through several solutions here but not sure where to start (coding n00b). This one looks closest to what I need but the data is already in a Pandas dataframe at the start while I only have the csv. Do I need Pandas? Do I need to create a dataframe from the csv file? Can this be done using only Python's csv library? HELP
P.S. The csv has 1000+ columns if that is of any significance.
To get your expected result run:
result = df.melt(var_name='ID', value_name='CVE').dropna()
The result, for your data sample is:
ID CVE
0 ID.12345 CVE-xxxx-1234
1 ID.12345 CVE-xxxx-3456
2 ID.45678 CVE-xxxx-5678

Read specific excel cell value without loading the whole sheet to dataframe?

I know how to read the whole dataset, I know how to read a part of it, but it always reads all of the columns from my excel file. I do it like this:
myfile = pd.ExcelFile('my_file.xlsx')
myfile.parse(2, skiprows=14, skipfooter= 2).dropna(axis=1, how='all')
But I can not read only one specific cell this way, because it read the whole row. Is there a way to limit the parser to one column?
UPDATE:
looking for a Pandas solution
Update your pandas to 0.24.2:
Docs: read_excel, specifically read usecols
I believe you will need to use a combination of skiprows and skipfooter to narrow down to specific row and usecols to get the column. This way you will get the specific cells value.

How to write dataframe to csv with a single row header(5k columns)?

I am trying to export a pandas dataframe with to_csv so it can be processed by another tool before using it again with python. It is a token dataset with 5k columns. When exported the header is split in two rows. This might not be an issue for pandas but in this case I need to export it on a single row csv. Is this a pandas limitation or a csv format one?
Currently, searching returned no compatible results. The only solution I came up is writing the column names and the values separately, eg. writing an str column list first and then a numpy array to the csv. Can this be implemented, and if so how?
For me this problem was caused by having multiple indexes. The easiest way to resolve this issue is to specify your own headers. I found reference to an option called tupleize_cols but it doesn't exist in current (1.2.2) pandas.
I was using the following aggregation:
df.groupby(["device"]).agg({
"outage_length":["count","sum"],
}).to_csv("example.csv")
This resulted in the following csv output:
,outage_length,outage_length
,count,sum
device,,
device0001,3,679.0
device0002,1,113.0
device0003,2,400.0
device0004,1,112.0
I specified my own headers in the call to to_csv; excluding my group_by, as follows:
}).to_csv("example.csv",header=("flaps","downtime"))
And got the following csv output, which was much more pleasing to spreadsheet software:
device,flaps,downtime
device0001,3,679.0
device0002,1,113.0
device0003,2,400.0
device0004,1,112.0

Is this possible in python with pandas or xlwt

I have an existing excel. That looks like
and I have another excel that has around 40000 rows and around 300 columns. shortened version looks like
I would like to append values to my existing excel from second excel. But only values that match values in col4 from my existing excel. So i would get something like this
Hope you guys get the picture of what I am trying to do.
yes, that is possible in pandas and it is way faster than anything in excel
df_result = pd.merge(FirstTable, SecondTable, how='left', on='col4')
this will look into both the tables for column "col4" so it needs to be named this way in both the tables.
Also be aware of the fact that if you have multiple values in second table for single value in the first table it will make as many lines in the result as in the second table.
to read the excel you can use:
import pandas as pd
xl=pd.ExcelFile('MyFile.xlsx')
FirstTable = pd.read_excel(xl, 'sheet_name_FIRST_TABLE')
for more detailed description see documentation

Categories