Python convert TXT to CSV - python

I've been trying to convert a txt file to CSV but have been running into trouble.
My text document is in the following format:
POP Issue: key=u'VPER-242', id=u'167782'
POP Issue: key=u'TE-8', id=u'215771'
POP Issue: key=u'OUTDIAL-233', id=u'223166'
POP Issue: key=u'OUTDIAL-232', id=u'223047'
The goal is to throw this into a CSV file that looks like the following with 2 columns:
Name of issue
POP Issue: key=u'VPER-242'
POP Issue: key=u'TE-8'
POP Issue: key=u'OUTDIAL-233'
POP Issue: key=u'OUTDIAL-232'
Issue ID
id=u'167782'
id=u'215771'
id=u'223166'
id=u'223047'
Basically using the " , " in the txt file to act as a delimiter and separate the two into columns. The following code has worked to get the column names at the top of my CSV as well as splitting, but it is not in the right format and doesn't separate by " , ".
import csv
import itertools
with open('newfile1.txt', 'r') as in_file:
stripped = (line.strip() for line in in_file)
lines = (line for line in stripped if line)
grouped = itertools.izip(*[lines] * 2)
with open('newfile1.csv', 'w') as out_file:
writer = csv.writer(out_file)
writer.writerow(('Name of Issue', 'Issue ID'))
writer.writerows(grouped)
This is what this code outputs - which is close but not quite right. I don't want spaces and need for the Issue ID column to only have the ID=u'number' data and the Name of issue to only have the POP Issue data. Anyone have any suggestions? Thank you!
Name of Issue
POP Issue: key=u'VPER-242', id=u'167782'
POP Issue: key=u'TE-8', id=u'215771'
POP Issue: key=u'OUTDIAL-233', id=u'223166'
Issue ID
POP Issue: key=u'TE-8', id=u'215771'
POP Issue: key=u'OUTDIAL-232', id=u'223047'
POP Issue: key=u'OUTDIAL-229', id=u'222309'

Your code is just using itertools.izip to zip together the same array, so that is why it is printing the same result under both columns. You need to split on the comma and then move forward.

import csv
txt_file = r"YourTextDocument.txt"
csv_file = r"NewProcessedDoc.csv"
in_txt = csv.reader(open(txt_file, "rb"), delimiter = ',')
out_csv = csv.writer(open(csv_file, 'wb'))
out_csv.writerows(in_txt)
print 'done! go check your NewProcessedDoc.csv file'
# You can insert new rows manually in your csv for the titles (Name of issue & Issue ID)

EDITED: more details
Short answer:
Replace this:
grouped = itertools.izip(*[lines] * 2)
with this:
grouped = [line.split(',') for line in lines]
Longer answer:
Your "grouped" variable is containing pairs of duplicate lines (not what you wanted)
If your Input line doesn't contain any other comma (",") then str.split is your friend for this Mission.
Cheers

Related

Remove line break after CSV file aggregation

I am aggregating data in a CVS file, the code:
import pandas
df = pandas.read_csv("./input.csv", delimiter=";", low_memory=False)
df.head()
count_severity = df.groupby("PHONE")["IMEI"].unique()
has_multiple_elements = count_severity.apply(lambda x: len(x)>1)
result = count_severity[has_multiple_elements]
result.to_csv("./output.csv", sep=";")
and in some lines of the received file, I get the following:
It turns out that I get the second column, which is after the sign ;, divided into two rows.
Could you tell me please, how to get rid of this line break? I tried adding a parameter line_terminator=None in result.to_csv - it didn't help.
Any method is accepted, even if you have to overwrite this file and save a new one. I also tried this:
import pandas as pd
output_file = open("./output.csv", "r")
output_file = ''.join([i for i in output_file]).replace("\n", "")
output_file_new = open("./output_new.csv", "w")
output_file_new.writelines(output_file)
output_file_new.close()
But then I get solid lines, which is not good at all.
To summarize, I should get the format of this file:
Thank You!
If your wrong lines always start with a comma, you could just replace the sequence "\n," by ",".
with open("./output.csv", "r") as file:
content = file.read()
new_content = content.replace("\n,", ",")
with open("./new_output.csv", "w") as file:
file.write(new_content)

Python: Replace string in a txt file but not on every occurrence

I am really new to python and I need to change new artikel Ids to the old ones. The Ids are mapped inside a dict. The file I need to edit is a normal txt where every column is sperated by Tabs. The problem is not replacing the values rather then only replacing the ouccurances in the desired column which is set by pos.
I really would appreciate some help.
def replaceArtCol(filename, pos):
with open(filename) as input_file, open('test.txt','w') as output_file:
for each_line in input_file:
val = each_line.split("\t")[pos]
for row in artikel_ID:
if each_line[pos] == pos
line = each_line.replace(val, artikel_ID[val])
output_file.write(line)`
This Code just replaces any occurance of the string in the text file.
supposed your ID mapping dict looks like ID_mapping = {'old_id': 'new_id'}, I think your code is not far from working correctly. A modified version could look like
with open(filename) as input_file, open('test.txt','w') as output_file:
for each_line in input_file:
line = each_line.split("\t")
if line[pos] in ID_mapping.keys():
line[pos] = ID_mapping[line[pos]]
line = '\t'.join(line)
output_file.write(line)
if you're not working in pandas anyway, this can save a lot of overhead.
if your data is tab separated then you must load this data into dataframe.. this way you can have columns and rows structure.. what you are sdoing right now will not allow you to do what you want to do without some complex and buggy logic. you may try these steps
import pandas as pd
df = pd.read_csv("dummy.txt", sep="\t", encoding="latin-1")
df['desired_column_name'] = df['desired_column_name'].replace({"value_to_be_changed": "newvalue"})
print(df.head())

python - Split a string in a CSV file by delimiter

I have a CSV file with the following data:
Date,Profit/Losses
Jan-10,867884
Feb-10,984655
Mar-10,322013
Apr-10,-69417
May-10,310503
Jun-10,522857
Jul-10,1033096
Aug-10,604885
Sep-10,-216386
Oct-10,477532
Nov-10,893810
Dec-10,-80353
I have imported the file in python like so:
with open(csvpath, 'r', errors='ignore') as fileHandle:
lines = fileHandle.read()
I need to loop through these lines such that I extract just the months i.e. "Jan", "Feb", etc. and put it in a different list. I also have to somehow skip the first line i.e. Date, Profit/Losses which is the header.
Here's the code I wrote I so far:
months = []
for line in lines:
months.append(line.split("-")
When I try to print the months list though, it splits every single character in the file!!
Where am I going wrong here??
You can almost always minimize the pain by using specialized tools, such as the csv module and list comprehension:
import csv
with open("yourfile.csv") as infile:
reader = csv.reader(infile) # Create a new reader
next(reader) # Skip the first row
months = [row[0].split("-")[0] for row in reader]
One answer to your question is to use fileHandle.readlines().
lines = fileHandle.readlines()
# print(lines)
# ['Date,Profit/Losses\n', 'Jan-10,867884\n', 'Feb-10,984655\n', 'Mar-10,322013\n',
# 'Apr-10,-69417\n', 'May-10,310503\n', 'Jun-10,522857\n', 'Jul-10,1033096\n', 'Aug-10,604885\n',
# 'Sep-10,-216386\n', 'Oct-10,477532\n', 'Nov-10,893810\n', 'Dec-10,-80353\n']
for line in lines[1:]:
# Starting from 2nd item in the list since you just want months
months.append(line.split("-")[0])
Try this if you really want to do it the hard way:
months = []
for line in lines[1:]:
months.append(line.split("-")[0])
lines[1:] will skip the first row and line.split("-")[0] will only pull out the month and append to your list months.
However, as suggested by AChampion, you should really look into the csv or pandas packages.
This should deliver desired results (assuming that file named data.csv in same directory):
result = []
with open('data.csv', 'r', encoding='UTF-8') as data:
next(data)
for record in data:
result.append(record.split('-')[0])

Only outputting a few lines into a text file, instead of all of them

I've made a Python script that grabs information from a .csv archive, and outputs it into a text file as a list. The original csv file has over 200,000 fields to input and output from, yet when I run my program it only outputs 36 into the .txt file.
Here's the code:
import csv
with open('OriginalFile.csv', 'r') as csvfile:
emailreader = csv.reader(csvfile)
f = open('text.txt', 'a')
for row in emailreader:
f.write(row[1] + "\n")
And the text file only lists up to 36 strings. How can I fix this? Is maybe the original csv file too big?
After many comments, the original problem was encoding of characters in the csv file. If you specify the encoding in pandas it will read it just fine.
Any time you are dealing with a csv file (or excel, sql or R) I would use Pandas DataFrames for this. The syntax is shorter and easier to know what is going on.
import pandas as pd
csvframe = pd.read_csv('OriginalFile.csv', encoding='utf-8')
with open('text.txt', 'a') as output:
# I think what you wanted was the 2nd column from each row
output.write('\n'.join(csvframe.ix[:,1].values))
# the ix is for index and : is for all the rows and the 1 is only the first column
You might have luck with something like the following:
with open('OriginalFile.csv', 'r') as csvfile:
emailreader = csv.reader(csvfile)
with open('text.txt','w') as output:
for line in emailreader:
output.write(line[1]+'\n')

Copy specific Column from a .csv file to another .csv file and save it as a new .csv file issue

Let's say File1 is the file where i want to copy the column from, and File 2 is the file where I want that column to be pasted at, and once pasted, save this file as a new file with extension .csv. It seem like a simple code to do, yet my first atempt of the code has giving me this error "AttributeError: 'file' object has no attribute 'writerow'
".Clearly, I seem to have no idea what I am doing wrong here. So i was wondering if you guys could help me. Here is the code I have written so far:
import csv
File1 = 'C:/Users/Alan Cedeno/Desktop/Test_Folder/dyn_0.csv'
File2 = 'C:/Users/Alan Cedeno/Desktop/Test_Folder/HiSAM1_data_160215_164858.csv'
with open(File1, "r") as r, open(File2, "a") as w:
reader = csv.reader(r, lineterminator = "\n")
writer = csv.writer(w, lineterminator = "\n")
for row in reader:
w.writerow(row[0])
If the question needs formatting please let me know. Also, if you think the code will not do what I want, a hint of where I can get started will definitely help. Please keep in mind I am a slow learner so if you can show me how to make it work step by step that will be a huge help! I just need some starter so I can follow on it and write my own. Thanks :o)
Your most immediate problem is that w is the file object... you want writer. But you've got a few other issues. First, you described 3 files, not two. Next, you need to actually insert the column. Finally, you have to decide what to do if the two files have different lengths. In this example I assumed you wanted to take the first column from the first csv file and insert it as the first column in the merged result. I tweeked file names to (hopefully) make it more clear.
The following code has several techniques for merging csvs as noted in the comments. You need to change them to your circumstances.
import os
import csv
File1 = 'C:/Users/Alan Cedeno/Desktop/Test_Folder/dyn_0.csv'
File2 = 'C:/Users/Alan Cedeno/Desktop/Test_Folder/HiSAM1_data_160215_164858.csv'
root, ext = os.path.splitext(File2)
output = root + '-new.csv'
with open(File1) as r1, open(File2) as r2, open(output, 'w') as w:
writer = csv.writer(w)
merge_from = csv.reader(r1)
merge_to = csv.reader(r2)
# skip 3 lines of headers
for _ in range(3):
next(merge_from)
for merge_from_row, merge_to_row in zip(merge_from, merge_to):
# insert from col 0 as to col 0
merge_to_row.insert(0, merge_from_row[0])
# replace from col 1 with to col 3
merge_to_row[1] = merge_from_row[3]
# delete merge_to rows 5,6,7 completely
del merge_to_row[5:8]
writer.writerow(merge_to_row)

Categories