Column size issue : read_csv - python

I have a dataframe that has 4 columns. I have to convert this dataframe to csv for working in my local computer. when I convert dataframe to csv I have only one column:
df = pd.read_csv("final.csv")
print df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20479 entries, 0 to 20478
Data columns (total 1 columns)
How can I convert this csv to dataframe with 4 columns?

Question is inconcise. Are you aiming to write a pandas dataframe object to a csv file, or create a dataframe object from an existing csv file?
Pandas Dataframe to CSV this link should be sufficient to write a df to a csv file, and vice versa listed here Dataframe from CSV.
A csv file (comma separated values) is separated by commas so make sure the separator is consistent.

When you read in your dataframe, you might have to explicitly state what type of separator is being used. I would open the csv in a text editor and see what the separator is. If, for example, the separator used was "|", I would use the following code:
df = pd.read_csv('final.csv', sep='|')
Then, to save to a .csv the code should be as simple as:
df.to_csv('path/to/file/csvFileName.csv', index=False)
I would recommend using index=False like I did, otherwise the pandas index will be included as a column in your csv file. Cheers.

Related

Saving each DataFrame column to separate CSV files

I have some dataframes, one of them is the following:
L_M_P = pd.read_csv('L_M_P.csv') # dimensions 17520x33
I would like to be able to save each column as an independent csv file, without having to do it manually as follows:
L_M_P[:,0].to_csv('column0.csv')
L_M_P[:,1].to_csv('column1.csv')
...
In that case, I would have 33 new '.csv' files, each with dimensions 17520x1.
You can iterate through columns and write it to files.
for column in df.columns:
df[column].to_csv(column + '.csv')
Note: Assuming language to be python as the question has pd mentioned in it and all mentioned code is part of pandas

how to make pandas not read display values from csv?

I have a date column in csv file like as shown below
23/6/2011 7:00
21/4/1998 05:00
17/02/1990
11/01/1985 30:30:01
26/02/1976
45:42:7
But the problem here is, when I double click the rows in csv, the actual date value is correctly displayed 15/02/2010 10:30:00` etc.
My csv looks like as below
But I cannot do this manually because you can imagine, I have 20-30 csv files and there are lot of rows like this.
So, when I read the column in pandas dataframe and apply datetime function like below,
df['Date'] = pd.to_datetime(df['Date'])
ParserError: hour must be in 0..23: 55:45.0
But how can I make pandas read the actual value and not csv display value?
I tried changing the format in excel csv file but that doesn't help
Basically I want pandas to read the double clicked value from csv but not the display value?

How to read an excel file with nested columns in pandas

Using Pandas, I'm trying to read an excel file that looks like the following:
another sample
I tried to read the excel file using the regular approach by running: df = pd.read_excel('filename.xlsx', skiprows=6).
But the problem with it is that I don't get all the columns names needed and most of the column names are Unnamed:1
Is there a way to solve these and read all the columns? Or an approach were I can convert it to a json file

Working with csv files that have delimiters

I have to read data from a csv file and I want to convert two columns by making use of one hot encoding.
The csv files data has one column with ';' in between the data (E.g. CITY;MONTH;SALES_AMOUNT). How do I load this in pandas dataframe in separate columns?
Desired result : E.g CITY MONTH SALES_AMOUNT
Instead of: CITY;MONTH;SALES_AMOUNT
You can use the delimiter parameter when reading the CSV file.
import pandas as pd
pd.read_csv('dataset.csv', delimiter = ';')

How to read data from excel from a particular column in python

I have an excel sheet and I am reading the excel sheet using pandas in python.
Now I want to read the excel file based on a column, if the column has some value then do not read that row, if the column is empty than read that and store the values in a list.
Here is a screenshot
Excel Example
Now in the above image when the uniqueidentifier is yes then it should not read that value, but if it is empty then it should start reading from that value.
How to do that using python and how to get index so that after I have performed some function that I am again able to write to that blank unique identifier column saying that row has been read
This is possible for csv files. There you could do
iter_csv = pandas.read_csv('file.csv', iterator=True, chunksize=100000)
df = pd.concat([chunk[chunk['UniqueIdentifier'] == 'True'] for chunk in iter_csv])
But pd.read_excel does not offer to return an iterator object, maybe some other excel-readers can. But I don't no which ones. Nevertheless you could export your excel file as csv and use the solution for csv files.

Categories