I have a file of 40 columns and 600 000 rows. After processing it in pandas dataframe, i would like to save the data frame to csv with different spacing length. There is a sep kwarg in df.to_csv, i tried with regex, but i'm getting error
TypeError: "delimiter" must be an 1-character string.
I want the output with different column spacing, as shown below
A B C D E F G
1 3 5 8 8 9 8
1 3 5 8 8 9 8
1 3 5 8 8 9 8
1 3 5 8 8 9 8
1 3 5 8 8 9 8
Using the below code i'm getting the tab delimited. which are all with same spacing.
df.to_csv("D:\\test.txt", sep = "\t", encoding='utf-8')
A B C D E F G
1 3 5 8 8 9 8
1 3 5 8 8 9 8
1 3 5 8 8 9 8
1 3 5 8 8 9 8
1 3 5 8 8 9 8
I don't want to do looping, It might take lot of time for 600k lines.
Thank you for comments, It helped me.
Below is the code.
import pandas as pd
#Create DataFrame
df = pd.DataFrame({'A':[0,1,2,3],'B':[0,11,2,333],'C':[0,1,22,3],'D':[00,1,2,33]})
#Convert the Columns to string
df[df.columns]=df[df.columns].astype(str)
#Create the list of column separator width
SepWidth = [5,6,3,8]
#Temp dict
tempdf = {}
#Convert all the column to series
for i, eCol in enumerate(df):
tempdf[i] = pd.Series(df[eCol]).str.pad(width=SepWidth[i])
#Final DataFrame
Fdf = pd.concat(tempdf, axis=1)
#print Fdf
#Export to csv
Fdf.to_csv("D:\\test.txt", sep='\t', index=False, header=False, encoding='utf-8')
output of test.txt
0 0 0 0
1 11 1 1
2 2 22 2
3 333 3 33
UPDATE
Tab delimited ('\t') was included in spacing, while using pandas.to_csv. Behalf of pandas.to_csv i'm using below code to save as txt.
numpy.savttxt(file, df.values, fmt='%s')
Related
I have a data frame that looks like this
A B C
1 4 7
2 5 8
3 6 9
And also another data frame that looks like this
A B C
2 1 7
4 3 9
6 5 8
How can I combine those two data frames to get a new data frame that looks like this
A B C
1 4 7
2 5 8
3 6 9
2 1 7
4 3 9
6 5 8
Basically, the two data frames have the same column names and number of columns. I just want to combine all of the rows. Would prefer using pandas to do this.
Check with append
df1 = df1.append(df2)
I have this data set in an Excel file. I want to keep the data which have only length 6 and delete rest and export it in the split of single values stored in a separate column.
Please tell me if we have any function to split the numeric values in the file to read it and split
From your shared data it seems it has spaces between numbers so they will already be in str
you can try below code:
your df looks like this:
a
0 11
1 2
2 3 2 4
3 5
4 1
5 6
6 1 1
7 6
8 6 7 7 7 6 6 8 8 8
9 6 8 7 9 5 2 1 44 6 55
10 6 8 7 9 5 2 1 44 6 55 4 4 4 4
filter rows with len equal to 6
df=df[df['a'].str.len()==6]
then split them using split() method like this
df['a'].str.split(" ", expand = True)
output:
0 1 2 3
2 3 2 4
EDIT:
for having trouble with memory while reading a large file you can refer to this SO post
OR read the file in chunks and append/save the output in new file
reader = pd.read_csv(filePath,chunksize=1000000,low_memory=False,header=0)
I have a (2.3m x 33) size dataframe. As I always do when selecting columns to keep, I use
colsToKeep = ['A','B','C','D','E','F','G','H','I']
df = df[colsToKeep]
However, this time the data under these columns becomes completely jumbled up on running the code. Entries for row A might be in row D for example. Totally at random.
Has anybody experienced this kind of behavior before? There is nothing out of the ordinary about the data and the df is totally fine before running these lines. Code run before problem begins:
with open('file.dat','r') as f:
df = pd.DataFrame(l.rstrip().split() for l in f)
#rename columns with the first row
df.columns = df.iloc[0]
#drop first row which is now duplicated
df = df.iloc[1:]
#. 33 nan columns - Remove all the nan columns that appeared
df = df.loc[:,df.columns.notnull()]
colsToKeep = ['A','B','C','D','E','F','G','H','I']
df = df[colsToKeep]
Data suddenly goes from being nicely formatted such as:
A B C D E F G H I
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9
to something more random like:
A B C D E F G H I
7 9 3 4 5 1 2 8 6
3 2 9 2 1 6 7 8 4
2 1 3 6 5 4 7 9 8
I imported the data from csv file with pandas. I want to split the column which includes 50 (0 to 49) values into 5 rows each having ten values. Can anyone tell me how i can do this transpose in form of pandas frame?
Let me rephrase what i said:
I attached the data that i have. I wanted to select the second column, and split it into two rows each having 10 values.
That is the code i have done so far:(I couldn't get the picture of 50 rows so i have only put 20 rowsenter image description here)
import numpy as np
import pandas as pd
df = pd.read_csv('...csv')
df.iloc[:50,:2]
Consider the dataframe df
np.random.seed([3,1415])
df = pd.DataFrame(dict(mycolumn=np.random.randint(10, size=50)))
using numpy and reshape'ing, ignoring indices
pd.DataFrame(df.mycolumn.values.reshape(5, -1))
0 1 2 3 4 5 6 7 8 9
0 0 2 7 3 8 7 0 6 8 6
1 0 2 0 4 9 7 3 2 4 3
2 3 6 7 7 4 5 3 7 5 9
3 8 7 6 4 7 6 2 6 6 5
4 2 8 7 5 8 4 7 6 1 5
How would I combined multiples columns into a single column in excel using pandas in Python?
a=[5,4,3,2,5,4,6,9,8,4,3,2,6]
b=[11,12,1,2,11,9,11,11,4,12,0,2,11]
c=[9,5,4,6,10,5,12,13,14,10,3,6.1,5]
from pandas import DataFrame
df = DataFrame({'Stimulus Time': a c, 'Reaction Time': b})
df.to_excel('case2.xlsx',sheet_name='sheet1', index=False)
This gives me the following output:
Reaction Time Stimulus Time
0 11 5
1 12 4
2 1 3
3 2 2
4 11 5
5 9 4
6 11 6
7 11 9
8 4 8
9 12 4
10 0 3
11 2 2
12 11 6
However I need the output in the following format:
Reaction Time Stimulus Time
From 0 to 11 5
From 1 to 12 4
From 2 to 1 3
From 3 to 2 2
.......
.......
........
Thanks,
D
I'd suggest using an intermediate list which converts the start and end times to a string.
e.g.
d = ["From "+str(i)+" to "+ str(j) for i,j in zip(range(0,len(b)),b)]