Reading from csv file [duplicate]

Reading from csv file [duplicate] - python

This question already has answers here:
Python Pandas: How to read only first n rows of CSV files in?
(3 answers)
Closed last month.
How to read the first cell from my csv file and store it as a variable
for example, my list is
header 1
header 2
AM
Depth
Value
10
20
30
122
60
222
how can I read the (AM) cell and store it as "x" variable?
and how I can I ignore AM cell later on and start my data frame from my headers (Depth, value)?

You should be able to get a specific row/column using indexing. iloc should be able to help.
For example, df.iloc[0,0] returns AM.
Also, pandas.read_csv allows you to skip rows when reading the data, You can use pd.read_csv("test.csv", sep="\t",skiprows=1) to skip first row.
Result:
0 10 20
1 30 122
2 60 222

Use pd.read_csv and then select the first row:
import pandas as pd
df = pd.read_csv('your file.csv')
x = df.iloc[0]['header 1']
Then, to delete it, use df.drop:
df.drop(0, inplace=True)

Hi I am using dummy csv file which is generated using data you posted in this question.
import pandas as pd
# read data
df = pd.read_csv('test.csv')
File contents are as follows:
header 1 header 2
0 AM NaN
1 Depth Value
2 10 20
3 30 122
4 60 222
One can use usecols parameter to access different columns in the data. If you are interested in just first column in this case it can be just 0 or 1. Using 0 or 1 you can access individual columns in the data.
You can save contents of this to x or whichever variable you want as follows:
# Change usecols to load various columns in the data
x = pd.read_csv('test.csv',usecols=[0])
Header:
# number of line which you want to use as a header set it using header parameter
pd.read_csv('test.csv',header=2)
Depth Value
0 10 20
1 30 122
2 60 222

Related

Python Panda DataFrame add column headers to data from clipboard dynamically

I am copying data from my clipboard that contains no headers. I dont want the index column and I want to name the columns dynamically skipping the first column by count(ie 1,2,3...). The output data set would like the following below.
1 2 3 4 5 6 7 8 9 10
1981 5012.0 8269.0 10907.0 11805.0 13539.0 16181.0 18009.0 18608.0 18662.0 18834.0
Here is the code I'm starting with. The codes works but the column headers aren't dynamic and the data set may not always have the same number of columns. I'm not sure how to make the column headers be dynamic
import pandas as pd
df = pd.read_clipboard(index_col = 0, names = ["","1","2","3","4","5","6","7","8","9","10"])

To get exactly what you are looking for you can use:
df = pd.read_clipboard(header=None, index_col=0).rename_axis(None)

Can I update the value of a column based on the same column value in a python dataframe?

I have a dataframe to capture characteristics of people accessing a webpage. The list of time spent by each user in the page is one of the characteristic feature that I get as an input. I want to update this column with maximum value of the list. Is there a way in which I can do this?
Assume that my data is:
df = pd.DataFrame({Page_id:{1,2,3,4}, User_count:{5,3,3,6}, Max_time:{[45,56,78,90,120],[87,109,23],[78,45,89],[103,178,398,121,431,98]})
What I want to do is convert the column Max_time in df to Max_time:{120,109,89,431}
I am not supposed to add another column for computing the max separately as this table structure cannot be altered.
I tried the following:
for i in range(len(df)):
df.loc[i]["Max_time"] = max(df.loc[i]["Max_time"])
But this is not changing the column as I intended it to. Is there something that I missed?

df = pd.DataFrame({'Page_id':[1,2,3,4],'User_count':[5,3,3,6],'Max_time':[[45,56,78,90,120],[87,109,23],[78,45,89],[103,178,398,121,431,98]]})
df.Max_time = df.Max_time.apply(max)
Result:
Page_id User_count Max_time
0 1 5 120
1 2 3 109
2 3 3 89
3 4 6 431

You can use this:
df['Max_time'] = df['Max_time'].map(lambda x: np.max(x))

Force Pandas to keep multiple columns with the same name

I'm building a program that collects data and adds it to an ongoing excel sheet weekly (read_excel() and concat() with the new data). The issue I'm having is that I need the columns to have the same name for presentation (it doesn't look great with x.1, x.2, ...).
I only need this on the final output. Is there any way to accomplish this? Would it be too time consuming to modify pandas?

you can create a list of custom headers that will be read into excel
newColNames = ['x','x','x'.....]
df.to_excel(path,header=newColNames)

You can add spaces to the end of the column name. It will appear the same in a Excel, but pandas can distinguish the difference.
import pandas as pd
df = pd.DataFrame([[1,2,3],[4,5,6],[7,8,9]], columns=['x','x ','x '])
df
x x x
0 1 2 3
1 4 5 6
2 7 8 9

Using Pandas to Manipulate Multiple Columns

I have a 30+ million row data set that I need to apply a whole host of data transformation rules to. For this task, I am trying to explore Pandas as a possible solution because my current solution isn't very fast.
Currently, I am performing a row by row manipulation of the data set, and then exporting it to a new table (CSV file) on disk.
There are 5 functions users can perform on the data within a given column:
remove white space
Capitalize all text
format date
replace letter/number
replace word
My first thought was to use the dataframe's apply or applmap, but this can only be used on a single column.
Is there a way to use apply or applymap to many columns instead of just one?
Is there a better workflow I should consider since I could be doing manipulations to 1:n columns in my dataset, where the maximum number of columns is currently around 30.
Thank you

You can use list comprehension with concat if need apply some function working only with Series:
import pandas as pd
data = pd.DataFrame({'A':[' ff ','2','3'],
'B':[' 77','s gg','d'],
'C':['s',' 44','f']})
print (data)
A B C
0 ff 77 s
1 2 s gg 44
2 3 d f
print (pd.concat([data[col].str.strip().str.capitalize() for col in data], axis=1))
A B C
0 Ff 77 S
1 2 S gg 44
2 3 D F

Missing first row while reading from file - Python Pandas [duplicate]

This question already has answers here:
Prevent pandas read_csv treating first row as header of column names
(4 answers)
Closed 3 years ago.
I have a file which has coordinates like
1 1
1 2
1 3
1 4
1 5
and so on
There are no zeros in them.I tried using comma and tab as a delimiter and still stuck in same problem.
Now when I printed the output to screen I saw something very weird. It looks like it is missing the very first line.
The output after running pa.read_csv('co-or.txt',sep='\t') is as follows
1 1
0 1 2
1 1 3
2 1 4
3 1 5
and so on..
I am not sure if I am missing any arguments in this.
Also when I tried to convert that to numpy array using np.array, It is again missing the first line and hence the first element [1 1]

df = pd.read_csv('data.csv', header=None)
You need to specifcy header=None otherwise pandas takes the first row as the header.
If you want to give them a meaningful name you can use the names as such:
df = pd.read_csv('data.csv', header=None, names=['foo','bar'])
Spend some time with pandas Documentation as well to get yourself familiar with their API. This one is for read_csv

You can try this:
file = open('file.dat','r')
lines = file.readlines()
file.close()
and it does work.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Reading from csv file [duplicate] - python

Use pd.read_csv and then select the first row: import pandas as pd df = pd.read_csv('your file.csv') x = df.iloc[0]['header 1'] Then, to delete it, use df.drop: df.drop(0, inplace=True)

Related

Python Panda DataFrame add column headers to data from clipboard dynamically

Can I update the value of a column based on the same column value in a python dataframe?

Force Pandas to keep multiple columns with the same name

Using Pandas to Manipulate Multiple Columns

Missing first row while reading from file - Python Pandas [duplicate]

Categories

Resources