If I have a .txt file with this content:
1020,"Balance",+10000
1030,"Something",-5000
How do I remove whats in the middle, so that the only thing im left with is
1020,+10000
1030,-5000
If it's always in the same index:
with open('yourfile.txt', 'r') as f:
lines = f.readlines()
output = []
for line in lines:
temp = line.split(",")
output.append(temp[0])
output.append(temp[2])
print(output)
I would approach it with a regex:
import re
string = "1030,\"Something\",-5000"
stripped = re.sub("[\"].*[\"]", "", string)
print stripped
This prints 1030,,-5000 from there you can remove one of the commas.
You could import the data into a dataframe using Pandas and then delete the second column like this.
import pandas as pd
df = pd.read_csv('example.txt', header=None)
del df[1]
print(df)
You can use csv module for this task:
import csv
def removeColumn(fn1,fn2,idx=1):
with open(fn1,"r") as csvfile1:
reader = csv.reader(csvfile1)
with open(fn2,"w") as csvfile2:
writer = csv.writer(csvfile2)
for row in reader:
writer.write(row[:idx] + row[:idx+1])
Related
I am aggregating data in a CVS file, the code:
import pandas
df = pandas.read_csv("./input.csv", delimiter=";", low_memory=False)
df.head()
count_severity = df.groupby("PHONE")["IMEI"].unique()
has_multiple_elements = count_severity.apply(lambda x: len(x)>1)
result = count_severity[has_multiple_elements]
result.to_csv("./output.csv", sep=";")
and in some lines of the received file, I get the following:
It turns out that I get the second column, which is after the sign ;, divided into two rows.
Could you tell me please, how to get rid of this line break? I tried adding a parameter line_terminator=None in result.to_csv - it didn't help.
Any method is accepted, even if you have to overwrite this file and save a new one. I also tried this:
import pandas as pd
output_file = open("./output.csv", "r")
output_file = ''.join([i for i in output_file]).replace("\n", "")
output_file_new = open("./output_new.csv", "w")
output_file_new.writelines(output_file)
output_file_new.close()
But then I get solid lines, which is not good at all.
To summarize, I should get the format of this file:
Thank You!
If your wrong lines always start with a comma, you could just replace the sequence "\n," by ",".
with open("./output.csv", "r") as file:
content = file.read()
new_content = content.replace("\n,", ",")
with open("./new_output.csv", "w") as file:
file.write(new_content)
I am trying to read a .txt file and save the data in each column as a list. each column in the file contains a variable which I will later on use to plot a graph. I have tried looking up the best method to do this and most answers recommend opening the file, reading it, and then either splitting or saving the columns as a list. The data in the .txt is as follows -
0 1.644231726
0.00025 1.651333945
0.0005 1.669593478
0.00075 1.695214575
0.001 1.725409504
the delimiter is a space '' or a tab '\t' . I have used the following code to try and append the columns to my variables -
import csv
with open('./rvt.txt') as file:
readfile = csv.reader(file, delimiter='\t')
time = []
rim = []
for line in readfile:
t = line[0]
r = line[1]
time.append(t)
rim.append(r)
print(time, rim)
However, when I try to print the lists, time and rim, using print(time, rim), I get the following error message -
r = line[1]
IndexError: list index out of range
I am, however, able to print only the 'time' if I comment out the r=line[1] and rim.append(r) parts. How do I approach this problem? Thank you in advance!
I would suggest the following:
import pandas as pd
df=pd.read_csv('./rvt.txt', sep='\t'), header=[a list with your column names])
Then you can use list(your_column) to work with your columns as lists
The problem is with the delimiter. The dataset contain multiple space ' '.
When you use '\t' and
print line you can see it's not separating the line with the delimiter.
eg:
['0 1.644231726']
['0.00025 1.651333945']
['0.0005 1.669593478']
['0.00075 1.695214575']
['0.001 1.725409504']
To get the desired result you can use (space) as delimiter and filter the empty values:
readfile = csv.reader(file, delimiter=" ")
time, rim = [], []
for line in readfile:
line = list(filter(lambda x: len(x), line))
t = line[0]
r = line[1]
Here is the code to do this:
import csv
with open('./rvt.txt') as file:
readfile = csv.reader(file, delimiter=” ”)
time = []
rim = []
for line in readfile:
t = line[0]
r = line[1]
time.append(t)
rim.append(r)
print(time, rim)
I have the below datafile and i want to delete the entire line that contains the "30" number in the first column. This number has always this position.
What i have thought is to read the file and create a list with this first column
and do a check if this number "30" exist on every item on the list and then delete the entire line given the index.
However i am not sure how to proceed.
Please let me know your thoughts .
Datafile
Here is what i have tried up to this point:
f = open("file.txt","r")
lines = f.readlines()
f.close()
f = open("file.txt","w")
for line in lines:
if line!="30"+"\n":
f.write(line)
f.close()
f = open("file.txt", "r")
lines = f.readlines()
f.close()
f = open("file.txt", "w")
for line in lines:
if '30' not in line[4:6]:
f.write(line)
f.close()
Try this
If you're willing to use pandas, you could do it in three lines:
import pandas as pd
# Read in file
df = pd.read_csv("file.txt", header=None, delim_whitespace=True)
# Remove rows where first column contains '30'
df = df[~df[0].str.contains('30')]
# Save the result
df.to_csv("cleaned.txt", sep='\t', index=False, header=False)
This approach can easily be extended to perform other types of filtering or manipulating your data.
One way you can do is use the regular expressions that captures 30 in the beginning is this:
import re
f = open("file.txt", "r")
lines = f.readlines()
f.close()
f = open("file.txt", "w")
for line in lines:
if re.search(r'^\d*30',line):
f.write(line)
f.close()
Hope it works well.
I am trying to read a file with below data
Et1, Arista2, Ethernet1
Et2, Arista2, Ethernet2
Ma1, Arista2, Management1
I need to read the file replace Et with Ethernet and Ma with Management. At the end of them the digit should be the same. The actual output should be as follows
Ethernet1, Arista2, Ethernet1
Ethernet2, Arista2, Ethernet2
Management1, Arista2, Management1
I tried a code with Regular expressions, I am able to get to the point I can parse all Et1, Et2 and Ma1. But unable to replace them.
import re
with open('test.txt','r') as fin:
for line in fin:
data = re.findall(r'\A[A-Z][a-z]\Z\d[0-9]*', line)
print(data)
The output looks like this..
['Et1']
['Et2']
['Ma1']
import re
#to avoid compile in each iteration
re_et = re.compile(r'^Et(\d+),')
re_ma = re.compile(r'^Ma(\d+),')
with open('test.txt') as fin:
for line in fin:
data = re_et.sub('Ethernet\g<1>,', line.strip())
data = re_ma.sub('Management\g<1>,', data)
print(data)
This example follows Joseph Farah's suggestion
import csv
file_name = 'data.csv'
output_file_name = "corrected_data.csv"
data = []
with open(file_name, "rb") as csvfile:
reader = csv.reader(csvfile, delimiter=',')
for row in reader:
data.append(row)
corrected_data = []
for row in data:
tmp_row = []
for col in row:
if 'Et' in col and not "Ethernet" in col:
col = col.replace("Et", "Ethernet")
elif 'Ma' in col and not "Management" in col:
col = col.replace("Ma", "Management")
tmp_row.append(col)
corrected_data.append(tmp_row)
with open(output_file_name, "wb") as csvfile:
writer = csv.writer(csvfile, delimiter=',')
for row in corrected_data:
writer.writerow(row)
print data
Here are the steps you should take:
Read each line in the file
Separate each line into smaller list items using the comments as delimiters
Use str.replace() to replace the characters with the words you want; keep in mind that anything that says "Et" (including the beginning of the word "ethernet") will be replaced, so remember to account for that. Same goes for Ma and Management.
Roll it back into one big list and put it back in the file with file.write(). You may have to overwrite the original file.
I create a one-column pandas DataFrame that contains only strings. One row is empty. When I write the file on disk, the empty row gets an empty quote "" while I want no quote at all. Here's how to replicate the issue:
import pandas as pd
df = "Name=Test\n\n[Actual Values]\nLength=12\n"
df = pd.DataFrame(df.split("\n"))
df.to_csv("C:/Users/Max/Desktop/Test.txt", header=False, index=False)
The output file should be like this:
Name=Test
[Actual Values]
Length=12
But instead is like this:
Name=Test
[Actual Values]
""
Length=12
Is there a way to instruct pandas not to write the quotes and leaves an empty row in the output text file? Thank you, a lot.
There is a parameter for DataFrame.to_csv called na_rep. If you have None values, it will replace them with whatever you pass into this field.
import pandas as pd
df = "Name=Test\n"
df += "\n[Actual Values]\n"
df += "Length=12\n"
df = pd.DataFrame(df.split("\n"))
df[df[0]==""] = None
df.to_csv("pandas_test.txt", header=False, index=False, na_rep=" ")
Unfortunately, it looks like passing in na_rep="" will print quotes into the csv. However, if you pass in a single space (na_rep=" ") it looks better aesthetically...
Of course you could always write your own function to output a csv, or simply replace the "" in the output file using:
f = open(filename, 'r')
text = f.read()
f.close()
text = text.replace("\"\"","")
f = open(filename, 'w')
f.write(text)
f.close()
And here's how you could write your own to_csv() method:
def to_csv(df, filename, separator):
f = open(filename, 'w')
for col in df.values:
for row in col:
f.write(row + separator)
f.close()