I know how to read the whole dataset, I know how to read a part of it, but it always reads all of the columns from my excel file. I do it like this:
myfile = pd.ExcelFile('my_file.xlsx')
myfile.parse(2, skiprows=14, skipfooter= 2).dropna(axis=1, how='all')
But I can not read only one specific cell this way, because it read the whole row. Is there a way to limit the parser to one column?
UPDATE:
looking for a Pandas solution
Update your pandas to 0.24.2:
Docs: read_excel, specifically read usecols
I believe you will need to use a combination of skiprows and skipfooter to narrow down to specific row and usecols to get the column. This way you will get the specific cells value.
Related
I am trying to read an excel file and write every fourth row into a new Excel file. I'm using Pandas to read and write, and if int(num%4) == 0 to determine which rows to select, but the iteration and subsequent writing continue to escape me. I've tried my best to look up answers, but I'm a new programmer and struggling :/
If you're using Pandas I'm assuming you've loaded the data into a dataframe?
If so then consider this:
import pandas as pd
df = pd.read_csv('YourFile.csv')
df.iloc[::4]
#once you're done with the data you can save it to another csv file
df.to_csv('OutputFile.csv')
This will leave your dataframe df with the 4th, 8th, 12th, etc. rows from your original dataframe/file. You can then read/write to each row left in the dataframe df. To visualize the before and after just insert df.head() before and after the df.iloc[::4] expression.
I did not understand what the problem is to be more specific, but you should try pandas' iloc property (or even loc depending on your df), check more info in here: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.iloc.html
I have a classic panda data frame made of ID and Text. I would like to get just one column and therefore i use the typical df["columnname"]. But at this point it becomes a Pandas Series. Is there a way to make a new dataframe with just that single column?
I'm asking this is because if I cast the Pandas series in a string (columnname = columnname.astype ("string")) and I save it in a text file, I see that it only saves the first sentence of each line and not the entire textual content, as I would like.
If there are any other solution, I'm open to learn :)
Try this: pd.DataFrame(dfname["columnname"])
I am working on an excel file and the pandas shows the excel file like this.
How do i get rid of all Unnamed rows ?
This will do the trick
remove_cols = [col for col in gd.columns if 'Unnamed' in col]
gd.drop(remove_cols, axis='columns', inplace=True)
Looking at the result you are getting, the Excel data doesn't start on the first row. It also starts in column B instead of column A.
If you are able to edit the Excel file, I would recommend starting your data at A1 (by removing the empty column A and the empty rows at the top using Excel), as that will make later processing much easier for everyone reading the file.
If this file is not editable (perhaps it is generated by another party), you will need to skip the first couple of rows to read the correct headings:
gd = pd.read_excel(r"D:\gdp.xlsx", skiprows=3, usecols="B:L")
I am trying to export a pandas dataframe with to_csv so it can be processed by another tool before using it again with python. It is a token dataset with 5k columns. When exported the header is split in two rows. This might not be an issue for pandas but in this case I need to export it on a single row csv. Is this a pandas limitation or a csv format one?
Currently, searching returned no compatible results. The only solution I came up is writing the column names and the values separately, eg. writing an str column list first and then a numpy array to the csv. Can this be implemented, and if so how?
For me this problem was caused by having multiple indexes. The easiest way to resolve this issue is to specify your own headers. I found reference to an option called tupleize_cols but it doesn't exist in current (1.2.2) pandas.
I was using the following aggregation:
df.groupby(["device"]).agg({
"outage_length":["count","sum"],
}).to_csv("example.csv")
This resulted in the following csv output:
,outage_length,outage_length
,count,sum
device,,
device0001,3,679.0
device0002,1,113.0
device0003,2,400.0
device0004,1,112.0
I specified my own headers in the call to to_csv; excluding my group_by, as follows:
}).to_csv("example.csv",header=("flaps","downtime"))
And got the following csv output, which was much more pleasing to spreadsheet software:
device,flaps,downtime
device0001,3,679.0
device0002,1,113.0
device0003,2,400.0
device0004,1,112.0
Any idea why below code can't keep the first column of my csv file? I would like to keep several columns in a new csv file, first column included. And if I select the name of first column to be on new file.
I get an error :
"Type" not index.
import pandas as pd
f = pd.read_csv("1.csv")
keep_col = ['Type','Pol','Country','User Site Code','PG','Status']
new_f = f[keep_col]
new_f.to_csv("2.csv", index=False)
Thanks a lot.
Try f.columns.values.tolist() and check the output of the first column. It sounds like there is an encoding issue when you are reading the CSV. You can try specifying the "encoding" option in your pd.read_csv() to see if that will get rid of the extra characters at the front. Otherwise, you can use f.rename(columns={'F48FBFBFType':'Type'} to change whatever the current name of your first column is to simply be 'Type'.
You are better off by specifying the columns to read from your csv file.
pd.read_csv('1.csv', names=keep_col).to_csv("2.csv", index=False)
Do you have any special characters in your first column?