I have a dataframe and I want to save it to a csv file. This operation is quite simple because I just need to use the following command:
df.to_csv(namefile, sep='', index=False)
The output is a csv file where each line contains the content of the a row of the dataframe. The output is this:
A,B,C,D
1,2,3,4
5,6,7,8
9,1,2,3
However, what I would like to do is to have a blank line every other row so that the output looks like this:
A,B,C,D
1,2,3,4
5,6,7,8
9,1,2,3
Basically I need to add the CR and LF symbol between every other line.
Can you suggest me a smart and elegant way to achieve my goal?
Use parameter line_terminator='\n\n':
line_terminator : string, default '\n'
The newline character or character sequence to use in the output file
Demo:
df.to_csv(namefile, line_terminator='\n\n')
Related
I have a csv file with a that is encoded with commas as separator but every row has a quote character at the start and at the end of each row.
In practice the data look like this
"0.00000E+000,6.25000E-001"
"1.00000E+000,1.11926E+000"
"2.00000E+000,9.01726E-001"
"3.00000E+000,7.71311E-001"
"4.00000E+000,6.82476E-001"
If I read the data using pd.read_csv() it just reads everything under a single column. What is the best workaround? Is there a simple way to pre-emptively strip the quotes character from the whole csv file?
If your file looks like
my_file=""""0.00000E+000,6.25000E-001"
"1.00000E+000,1.11926E+000"
"2.00000E+000,9.01726E-001"
"3.00000E+000,7.71311E-001"
"4.00000E+000,6.82476E-001"
"""
One way to remove the quotes prior to using Pandas would be
for line in my_file.split('\n'):
print(line.replace('"',''))
To write that to file, use
with open('output.csv','w') as file_handle:
for line in my_file.split('\n'):
file_handle.write(line.replace('"','')+'\n')
I'm struggeling with one task that can save plenty of time. I'm new to Python so please don't kill me :)
I've got huge txt file with millions of records. I used to split them in MS Access, delimiter "|", filtered data so I can have about 400K records and then copied to Excel.
So basically file looks like:
What I would like to have:
I'm using Spyder so it would be great to see data in variable explorer so I can easily check and (after additional filters) export it to excel.
I use LibreOffice so I'm not 100% sure about Excel but if you change the .txt to .csv and try to open the file with Excel, it should allow to change the delimiter from a comma to '|' and then import it directly. That work with LibreOffice Calc anyway.
u have to split the file in lines then split the lines by the char l and map the data to a list o dicts.
with open ('filename') as file:
data = [{'id': line[0], 'fname':line[1]} for line in f.readlines()]
you have to fill in tve rest of the fields
Doing this with pandas will be much easier
Note: I am assuming that each entry is on a new line.
import pandas as pd
data = pd.read_csv("data.txt", delimiter='|')
# Do something here or let it be if you want to just convert text file to excel file
data.to_excel("data.xlsx")
My CSV file has 3 columns: Name,Age and Sex and sample data is:
AlexÇ39ÇM
#Ç#SheebaÇ35ÇF
#Ç#RiyaÇ10ÇF
The column delimiter is 'Ç' and record delimiter is '#Ç#'. Note the first record don't have the record delimiter(#Ç#), but all other records have record delimiter(#Ç#). Could you please tell me how to read this file and store it in a dataframe?
Both csv and pandas module support reading csv-files directly. However, since you need to modify the file contents line by line before further processing, I suggest reading the file line by line, modify each line as desired and store the processed data in a list for further handling.
The necessary steps include:
open file
read file line by line
remove newline char (which is part of the line when using readlines()
replace record delimiter (since a record is equivalent to a line)
split lines at column delimiter
Since .split() returns a list of string elements we get an overall list of lists, where each 'sub-list' contains/represents the data of a line/record. Data formatted like this can be read by pandas.DataFrame.from_records() which comes in quite handy at this point:
import pandas as pd
with open('myData.csv') as file:
# `.strip()` removes newline character from each line
# `.replace('#;#', '')` removes '#;#' from each line
# `.split(';')` splits at given string and returns a list with the string elements
lines = [line.strip().replace('#;#', '').split(';') for line in file.readlines()]
df = pd.DataFrame.from_records(lines, columns=['Name', 'Age', 'Sex'])
print(df)
Remarks:
I changed Ç to ; which worked better for me due to encoding issues. However, the basic idea of the algorithm is still the same.
Reading data manually like this can become quite resource-intensive which might be a problem when handling larger files. There might be more elegant ways, which I am not aware of. When getting problems with performance, try to read the file in chunks or have a look for more effective implementations.
Having trouble with reading a csv file into a pandas dataframe where the line endings are not standard.
Here is my code:
df_feb = pd.read_csv(data_location, sep = ",",nrows = 500, header = None, skipinitialspace = True,encoding = 'utf-8')
Here is the output (personal info scratched out):
Output
This is what the input data looks like:
The above output splits what should be a single line into 4 lines. A new line should start for every phone number (phone number = scratched out bit).
I am aiming to have each line look like this:
Goal output
Thank you in advance for your help!
If the file have format have any rule (not unique format for each record), then I suggest you write your own convertion tool.
Here I suggest what the tool should do
Read file as plain text.
Put 4 lines into 1 records/class object ( as I see in picture, 4 records seem to have 4 lines)
Parse the line (split by comma, tab, whatever you have) to get attribute
Write attribute in another file, split by tab (or comma) => your csv
Now, you can load your csv to Pandas.
I have a file that looks like this:
1111,AAAA,aaaa\n
2222,BB\nBB,bbbb\n
3333,CCC\nC,cccc\n
...
Where \n represents a newline.
When I read this line-by-line, it's read as:
1111,AAAA,aaaa\n
2222,BB\n
BB,bbbb\n
3333,CCC\n
C,cccc\n
...
This is a very large file. Is there a way to read a line until a specific number of delimiters, or remove the newline character within a column in Python?
I think after you read the line, you need to count the number of commas
aStr.count(',')
While the number of commas is too small (there can be more than one \n in the input), then read the next line and concatenate the strings
while aStr.count(',') < Num:
another = file.readline()
aStr = aStr + another
1111,AAAA,aaaa\n
2222,BB\nBB,bbbb\n
According to your file \n here is not actually a newline character, it is plain text.
For actually stripping newline characters you could use strip() or other variations like rstrip() ot lstrip().
If you work with large files you don't need to load full content in memory. You could iterate line by line until some counter or anything else.
I think perhaps you are parsing a CSV file that has embedded newlines in some of the text fields. Further, I suppose that the program that created the file put quotation marks (") around the fields.
That is, I supposed that your text file actually looks like this:
1111,AAAA,aaaa
2222,"BB
BB",bbbb
3333,"CCC
C",cccc
If that is the case, you might want to use code with better CSV support than just line.split(','). Consider this program:
import csv
with open('foo.csv') as fp:
reader = csv.reader(fp)
for row in reader:
print row
Which produces this output:
['1111', 'AAAA', 'aaaa']
['2222', 'BB\nBB', 'bbbb']
['3333', 'CCC\nC', 'cccc']
Notice the five lines (delimited by newline characters) of the CSV file become 3 rows (some with embedded newline characters) in the CSV data structure.