read csv to pandas retaining values as it is - python

I am trying to read a csv and get it into a dataframe but I want to retain the values of columns.
For eg. my first column has values like 001234, 003462 in the csv file but the dataframe interprets it as 1234, 3462, etc. How do I retain the '00' at the front?
Please help! Thanks.

Try this:
df = pd.read_csv(file_path, dtype=str)

Related

How to store tuples in a pandas dataframe cell?

I have a csv import of datas store in such fashion
username;groups
alice;(admin,user)
bob;(user)
I want to do some data analysis on it and import them to a pandas dataframe so that the first column is stored as a string and the second as a tuple.
I tried mydataframe = pd.read_csv('file.csv', sep=';') then convert the groups column with astype method mydataframe['groups'].astype('tuple') but it won't work.
How to store other objects than strings/ints/floats in dataframes?
Thanks.
Untested, but try
mydataframe['groups'].apply(lambda text: tuple(text[1:-1].split(',')))

Pandas dataframe updating with .loc adds columns and indexes [duplicate]

I am trying to save a csv to a folder after making some edits to the file.
Every time I use pd.to_csv('C:/Path of file.csv') the csv file has a separate column of indexes. I want to avoid printing the index to csv.
I tried:
pd.read_csv('C:/Path to file to edit.csv', index_col = False)
And to save the file...
pd.to_csv('C:/Path to save edited file.csv', index_col = False)
However, I still got the unwanted index column. How can I avoid this when I save my files?
Use index=False.
df.to_csv('your.csv', index=False)
There are two ways to handle the situation where we do not want the index to be stored in csv file.
As others have stated you can use index=False while saving your
dataframe to csv file.
df.to_csv('file_name.csv',index=False)
Or you can save your dataframe as it is with an index, and while reading you just drop the column unnamed 0 containing your previous index.Simple!
df.to_csv(' file_name.csv ')
df_new = pd.read_csv('file_name.csv').drop(['unnamed 0'],axis=1)
If you want no index, read file using:
import pandas as pd
df = pd.read_csv('file.csv', index_col=0)
save it using
df.to_csv('file.csv', index=False)
As others have stated, if you don't want to save the index column in the first place, you can use df.to_csv('processed.csv', index=False)
However, since the data you will usually use, have some sort of index themselves, let's say a 'timestamp' column, I would keep the index and load the data using it.
So, to save the indexed data, first set their index and then save the DataFrame:
df.set_index('timestamp')
df.to_csv('processed.csv')
Afterwards, you can either read the data with the index:
pd.read_csv('processed.csv', index_col='timestamp')
or read the data, and then set the index:
pd.read_csv('filename.csv')
pd.set_index('column_name')
Another solution if you want to keep this column as index.
pd.read_csv('filename.csv', index_col='Unnamed: 0')
If you want a good format next statement is the best:
dataframe_prediction.to_csv('filename.csv', sep=',', encoding='utf-8', index=False)
In this case you have got a csv file with ',' as separate between columns and utf-8 format.
In addition, numerical index won't appear.

Pandas is adding an extra column of data when converting from dta to csv [duplicate]

I am trying to save a csv to a folder after making some edits to the file.
Every time I use pd.to_csv('C:/Path of file.csv') the csv file has a separate column of indexes. I want to avoid printing the index to csv.
I tried:
pd.read_csv('C:/Path to file to edit.csv', index_col = False)
And to save the file...
pd.to_csv('C:/Path to save edited file.csv', index_col = False)
However, I still got the unwanted index column. How can I avoid this when I save my files?
Use index=False.
df.to_csv('your.csv', index=False)
There are two ways to handle the situation where we do not want the index to be stored in csv file.
As others have stated you can use index=False while saving your
dataframe to csv file.
df.to_csv('file_name.csv',index=False)
Or you can save your dataframe as it is with an index, and while reading you just drop the column unnamed 0 containing your previous index.Simple!
df.to_csv(' file_name.csv ')
df_new = pd.read_csv('file_name.csv').drop(['unnamed 0'],axis=1)
If you want no index, read file using:
import pandas as pd
df = pd.read_csv('file.csv', index_col=0)
save it using
df.to_csv('file.csv', index=False)
As others have stated, if you don't want to save the index column in the first place, you can use df.to_csv('processed.csv', index=False)
However, since the data you will usually use, have some sort of index themselves, let's say a 'timestamp' column, I would keep the index and load the data using it.
So, to save the indexed data, first set their index and then save the DataFrame:
df.set_index('timestamp')
df.to_csv('processed.csv')
Afterwards, you can either read the data with the index:
pd.read_csv('processed.csv', index_col='timestamp')
or read the data, and then set the index:
pd.read_csv('filename.csv')
pd.set_index('column_name')
Another solution if you want to keep this column as index.
pd.read_csv('filename.csv', index_col='Unnamed: 0')
If you want a good format next statement is the best:
dataframe_prediction.to_csv('filename.csv', sep=',', encoding='utf-8', index=False)
In this case you have got a csv file with ',' as separate between columns and utf-8 format.
In addition, numerical index won't appear.

Creating a new pandas Dataframe from CSV file with no header

I am trying to create a new dataframe from csv:
frame = DataFrame(data=pd.read_csv(path))
the result is correct except that the first line becomes the columns:
so I add columns to the dtaframe:
columns = ['person-id','time-stamp','loc-id']
frame = DataFrame(data=pd.read_csv(path),columns=columns)
then it goes wrong:the dataframe is all nan
this confuses me,can anyone tell me what is going on with it?
You dont need DataFrame constructor, because output of read_csv is obviously DataFrame (if not use squeeze=True, then Series):
frame=pd.read_csv(path)
You need to tell read_csv() that your input has no column headers; by the time you give Dataframe the column names, it's too late. Try this:
columns = ['person-id','time-stamp','loc-id']
frame = pd.read_csv(path, names=columns)

Force Python Pandas DataFrame( read_csv() method) to avoid/not consider first row of my csv/txt file as header

I am reading a txt file (data.txt) using pandas read_csv method. The file has 16 columns and 600 rows. However, after reading the csv into dataframe, I observed that first row in my data.txt file has been taken as the column headings in the dataframe. This reduces the size of my dataframe to 599 from 600 in my text file. How can I force pandas to not use first row as headers for Dataframe.
I am using this code to read the file.
import pandas as pd
df = pd.read_csv("C:\<my_directory_path>\data.txt)
Just add header=None:
import pandas as pd
df = pd.read_csv("C:\<my_directory_path>\data.txt",header=None)
You can use the parameter header=None to read your data in with integers to index the columns, alternatively if you know what the names of your columns are, you can pass in something like names=['col1','col2','col3']

Categories