Looping through a dictionary to replace multiple values in text file - python

I'm trying to change several hex values in a text file. I made a CSV that has the original values in one column and the new values in another.
My goal is to write a simple Python script to find old values in the text file based on the first column and replace them with new values in the second.
I'm attempting to use a dictionary to facilitate this replace() that I created by looping through the CSV. Building it was pretty easy, but using it to executing a replace() hasn't been working out. When I print out the values after my script runs I'm still seeing the original ones.
I've tried reading in the text file using read() and executing the change to the whole file like above.
import csv
filename = "origin.txt"
csv_file = 'replacements.csv'
conversion_dict = {}
# Create conversion dictionary
with open(csv_file, "r") as replace:
reader = csv.reader(replace, delimiter=',')
for rows in reader:
conversion_dict.update({rows[0]:rows[1]})
#Replace values on text files based on conversion dict
with open(filename, "r") as fileobject:
txt = str(fileobject.read())
for keys, values, in conversion_dict.items():
new_text = txt.replace(keys, values)
I've also tried adding the updated text to a list:
#Replace values on text files based on conversion dict
with open(filename, "r") as fileobject:
txt = str(fileobject.read())
for keys, values, in conversion_dict.items():
new_text.append(txt.replace(keys, values))
Then, I tried using readlines() to replace the old values with new ones one line at a time:
# Replace values on text files based on conversion dict
with open(filename, "r") as reader:
reader.readlines()
type(reader)
for line in reader:
print(line)
for keys, values, in conversion_dict.items():
new_text.append(txt.replace(keys, values))
While troubleshooting, I ran a test to see if I was getting any matches between the keys in my dict and the text in the file:
for keys, values, in conversion_dict.items():
if keys in txt:
print("match")
else:
print("no match")
My output returned match on every row except the first one. I imagine with some trimming or something I could fix that. However, this proves that there are matches, so there must be some other issue with my code.
Any help is appreciated.

origin.txt:
oldVal9000,oldVal1,oldVal2,oldVal3,oldVal69
test.csv:
oldVal1,newVal1
oldVal2,newVal2
oldVal3,newVal3
oldVal4,newVal4
import csv
filename = "origin.txt"
csv_file = 'test.csv'
conversion_dict = {}
with open(csv_file, "r") as replace:
reader = csv.reader(replace, delimiter=',')
for rows in reader:
conversion_dict.update({rows[0]:rows[1]})
f = open(filename,'r')
txt = str(f.read())
f.close()
txt= txt.split(',') #not sure what your origin.txt actually looks like, assuming comma seperated values
for i in range(len(txt)):
if txt[i] in conversion_dict:
txt[i] = conversion_dict[txt[i]]
with open(filename, "w") as outfile:
outfile.write(",".join(txt))
modified origin.txt:
oldVal9000,newVal4,newVal1,newVal3,oldVal69

Related

Same python code block gives different outputs at different time

I want to create a word dictionary. The dictionary looks like
words_meanings= {
"rekindle": "relight",
"pesky":"annoying",
"verge": "border",
"maneuver": "activity",
"accountability":"responsibility",
}
keys_letter=[]
for x in words_meanings:
keys_letter.append(x)
print(keys_letter)
Output: rekindle , pesky, verge, maneuver, accountability
Here rekindle , pesky, verge, maneuver, accountability they are the keys and relight, annoying, border, activity, responsibility they are the values.
Now I want to create a csv file and my code will take input from the file.
The file looks like
rekindle | pesky | verge | maneuver | accountability
relight | annoying| border| activity | responsibility
So far I use this code to load the file and read data from it.
from google.colab import files
uploaded = files.upload()
import pandas as pd
data = pd.read_csv("words.csv")
data.head()
import csv
reader = csv.DictReader(open("words.csv", 'r'))
words_meanings = []
for line in reader:
words_meanings.append(line)
print(words_meanings)
This is the output of print(words_meanings)
[OrderedDict([('\ufeffrekindle', 'relight'), ('pesky', 'annoying')])]
It looks very odd to me.
keys_letter=[]
for x in words_meanings:
keys_letter.append(x)
print(keys_letter)
Now I create an empty list and want to append only key values. But the output is [OrderedDict([('\ufeffrekindle', 'relight'), ('pesky', 'annoying')])]
I am confused. As per the first code block it only included keys but now it includes both keys and their values. How can I overcome this situation?
I would suggest that you format your csv with your key and value on the same row. Like this
rekindle,relight
pesky,annoying
verge,border
This way the following code will work.
words_meanings = {}
with open(file_name, 'r') as file:
for line in file.readlines():
key, value = line.split(",")
word_meanings[key] = value.rstrip("\n")
if you want a list of the keys:
list_of_keys = list(word_meanings.keys())
To add keys and values to the file:
def add_values(key:str, value:str, file_name:str):
with open(file_name, 'a') as file:
file.writelines(f"\n{key},{value}")
key = input("Input the key you want to save: ")
value = input(f"Input the value you want to save to {key}:")
add_values(key, value, file_name)```
You run the same block of code but you use it with different objects and this gives different results.
First you use normal dictionary (check type(words_meanings))
words_meanings = {
"rekindle": "relight",
"pesky":"annoying",
"verge": "border",
"maneuver": "activity",
"accountability":"responsibility",
}
and for-loop gives you keys from this dictionary
You could get the same with
keys_letter = list(words_meanings.keys())
or even
keys_letter = list(words_meanings)
Later you use list with single dictionary inside this list (check type(words_meanings))
words_meanings = [OrderedDict([('\ufeffrekindle', 'relight'), ('pesky', 'annoying')])]
and for-loop gives you elements from this list, not keys from dictionary which is inside this list. So you move full dictionary from one list to another.
You could get the same with
keys_letter = words_meanings.copy()
or even the same
keys_letter = list(words_meanings)
from collections import OrderedDict
words_meanings = {
"rekindle": "relight",
"pesky":"annoying",
"verge": "border",
"maneuver": "activity",
"accountability":"responsibility",
}
print(type(words_meanings))
keys_letter = []
for x in words_meanings:
keys_letter.append(x)
print(keys_letter)
#keys_letter = list(words_meanings.keys())
keys_letter = list(words_meanings)
print(keys_letter)
words_meanings = [OrderedDict([('\ufeffrekindle', 'relight'), ('pesky', 'annoying')])]
print(type(words_meanings))
keys_letter = []
for x in words_meanings:
keys_letter.append(x)
print(keys_letter)
#keys_letter = words_meanings.copy()
keys_letter = list(words_meanings)
print(keys_letter)
The default field separator for the csv module is a comma. Your CSV file uses the pipe or bar symbol |, and the fields also seem to be fixed width. So, you need to specify | as the delimiter to use when creating the CSV reader.
Also, your CSV file is encoded as Big-endian UTF-16 Unicode text (UTF-16-BE). The file contains a byte-order-mark (BOM) but Python is not stripping it off, so you will notice the string '\ufeffrekindle' contains the FEFF UTF-16-BE BOM. That can be dealt with by specifying encoding='utf16' when you open the file.
import csv
with open('words.csv', newline='', encoding='utf-16') as f:
reader = csv.DictReader(f, delimiter='|', skipinitialspace=True)
for row in reader:
print(row)
Running this on your CSV file produces this:
{'rekindle ': 'relight ', 'pesky ': 'annoying', 'verge ': 'border', 'maneuver ': 'activity ', 'accountability': 'responsibility'}
Notice that there is trailing whitespace in the key and values. skipinitialspace=True removed the leading whitespace, but there is no option to remove the trailing whitespace. That can be fixed by exporting the CSV file from Excel without specifying a field width. If that can't be done, then it can be fixed by preprocessing the file using a generator:
import csv
def preprocess_csv(f, delimiter=','):
# assumes that fields can not contain embedded new lines
for line in f:
yield delimiter.join(field.strip() for field in line.split(delimiter))
with open('words.csv', newline='', encoding='utf-16') as f:
reader = csv.DictReader(preprocess_csv(f, '|'), delimiter='|', skipinitialspace=True)
for row in reader:
print(row)
which now outputs the stripped keys and values:
{'rekindle': 'relight', 'pesky': 'annoying', 'verge': 'border', 'maneuver': 'activity', 'accountability': 'responsibility'}
As I found that no one able to help me with the answer. Finally, I post the answer here. Hope this will help other.
import csv
file_name="words.csv"
words_meanings = {}
with open(file_name, newline='', encoding='utf-8-sig') as file:
for line in file.readlines():
key, value = line.split(",")
words_meanings[key] = value.rstrip("\n")
print(words_meanings)
This is the code to transfer a csv to a dictionary. Enjoy!!!

Python how to get the tweet data using specific word in csv file and put it in new csv file

I have data twitter in a CSV file (that I'm mining with a Python API). I get around 1000 lines of data. Now I want to shorten the tweet data using the specific Indonesian words “macet” or “kecelakaan” (in English “traffic” or “accident”) and put the matching rows into a new separate CSV file, just like in Excel using find all.
The sample data twitter is example1.csv and the new file which will be created after the search of the word "macet" or "kecelakaan" is example2.csv. But there is no result.
import re
import csv
with open('example1.csv', 'r') as csvFile:
reader = csv.reader(csvFile)
if re.search(r'macet', reader):
for row in reader:
myData = list(row)
print(row)
newFile = open('example2.csv', 'w')
with newFile:
writer = csv.writer(newFile)
writer.writerows(myData)
print("Writing complete")
I use spyder for environment Python 3.6.
The CSV file is already in the same folder with Spyder. Here is the screen capture image of my CSV twitter data
myCSVtwitterData
updated : Sample of csv file. OS using : Windows
There are a couple of problems with your code.
In your reading loop you are passing a csv.reader object to re.search, but it doesn't know how to search that object. You need to pass it text or byte strings.
The line
myData = list(row)
converts row into a new list and saves it to myData, but it's already a list, so no conversion is necessary. And that line replaces the previous contents of myData, but you actually want to save all the matching rows. However, there's no need to save the rows, you can just write them to the new file as you go.
Anyway, here's a repaired version of your code. From the screen shot it looks like you only want to search the text in column 2 of the input data (which corresponds to column C in your spreadsheet). I've created a regex that searches for the whole words "macet" and "kecelakaan", the "\b" matches at word boundaries so we don't get a match if "macet" or "kecelakaan" is part of a larger word.
import re
import csv
# Make a case-insensitive regex to match the words "macet" or "kecelakaan"
pattern = re.compile(r'\bmacet\b|\bkecelakaan\b', re.I)
with open('example1.csv', 'r', newline='') as csvFile, open('example2.csv', 'w', newline='') as newFile:
reader = csv.reader(csvFile)
writer = csv.writer(newFile)
for row in reader:
# Skip empty rows
if not row:
continue
if pattern.search(row[2]):
print(row)
writer.writerow(row)
print("Writing complete")
I've just made a couple of improvements to that code. It now uses the newline='' arg to open the CSV files, and it skips any empty lines in the input CSV. And the regex now ignores the case when looking for matching words.
Not answering about Python. But if you have a Linux OS, you can do it in one command line :
grep -i "macet" exemple1.csv > exemple2.csv
-i is for ignore case, so it will also match "Macet"
how is it~?
this code visit rows one by one
and find cells that contain a word in word_list
and write the value list on the row
import re
import csv
word_list = ['macet', 'kecelakaan']
with open('example1.csv', 'r') as csvFile, open('example2.csv', 'w') as newFile:
reader = csv.reader(csvFile)
writer = csv.writer(newFile, lineterminator='\n')
for row in reader:
new_row = [content for content in row if any(map(lambda word: word in content, word_list))]
if(new_row != []):
print(new_row)
writer.writerow(new_row)
print("Writing complete")

writing the data in text file while converting it to csv

I am very new with python. I have a .txt file and want to convert it to a .csv file with the format I was told but could not manage to accomplish. a hand can be useful for it. I am going to explain it with screenshots.
I have a txt file with the name of bip.txt. and the data inside of it is like this
I want to convert it to csv like this csv file
So far, what I could do is only writing all the data from text file with this code:
read_files = glob.glob("C:/Users/Emrehana1/Desktop/bip.txt")
with open("C:/Users/Emrehana1/Desktop/Test_Result_Report.csv", "w") as outfile:
for f in read_files:
with open(f, "r") as infile:
outfile.write(infile.read())
So is there a solution to convert it to a csv file in the format I desire? I hope I have explained it clearly.
There's no need to use the glob module if you only have one file and you already know its name. You can just open it. It would have been helpful to quote your data as text, since as an image someone wanting to help you can't just copy and paste your input data.
For each entry in the input file you will have to read multiple lines to collect together the information you need to create an entry in the output file.
One way is to loop over the lines of input until you find one that begins with "test:", then get the next line in the file using next() to create the entry:
The following code will produce the split you need - creating the csv file can be done with the standard library module, and is left as an exercise. I used a different file name, as you can see.
with open("/tmp/blip.txt") as f:
for line in f:
if line.startswith("test:"):
test_name = line.strip().split(None, 1)[1]
result = next(f)
if not result.startswith("outcome:"):
raise ValueError("Test name not followed by outcome for test "+test_name)
outcome = result.strip().split(None, 1)[1]
print test_name, outcome
You do not use the glob function to open a file, it searches for file names matching a pattern. you could open up the file bip.txt then read each line and put the value into an array then when all of the values have been found join them with a new line and a comma and write to a csv file, like this:
# set the csv column headers
values = [["test", "outcome"]]
current_row = []
with open("bip.txt", "r") as f:
for line in f:
# when a blank line is found, append the row
if line == "\n" and current_row != []:
values.append(current_row)
current_row = []
if ":" in line:
# get the value after the semicolon
value = line[line.index(":")+1:].strip()
current_row.append(value)
# append the final row to the list
values.append(current_row)
# join the columns with a comma and the rows with a new line
csv_result = ""
for row in values:
csv_result += ",".join(row) + "\n"
# output the csv data to a file
with open("Test_Result_Report.csv", "w") as f:
f.write(csv_result)

Python: How to capitalize the first column of a .txt file.

I have a .csv formatted .txt file. I am deliberating over the best manner in which to .capitalize the text in the first column.
.capitalize() is a string method, so I considered the following; I would need to open the file, convert the data to a list of strings, capitalize the the required word and finally write the data back to file.
To achieve this, I did the following:
newGuestList = []
with open("guestList.txt","r+") as guestFile :
guestList = csv.reader(guestFile)
for guest in guestList :
for guestInfo in guest :
capitalisedName = guestInfo.capitalize()
newGuestList.append(capitalisedName)
Which gives the output:
[‘Peter’, ‘35’, ‘ spain’, ‘Caroline’, ‘37’, ‘france’, ‘Claire’,’32’, ‘ sweden’]
The problem:
Firstly; in order to write this new list back to file, I will need to convert it to a string. I can achieve this using the .join method. However, how can I introduce a newline, \n, after every third word (the country) so that each guest has their own line in the text file?
Secondly; this method, of nested for loops etc. seems highly convoluted, is there a cleaner way?
My .txt file:
peter, 35, spain\n
caroline, 37, france\n
claire, 32, sweden\n
You don't need to split the lines, since the first caracter of the first word is the first caracter of the line :
with open("lst.txt","r") as guestFile :
lines=guestFile.readlines()
newlines=[line.capitalize() for line in lines]
with open("lst.txt","w") as guestFile :
guestFile.writelines(newlines)
You can just use a CSV reader and writer and access the element you want to capitalize from the list.
import csv
import os
inp = open('a.txt', 'r')
out = open('b.txt', 'w')
reader = csv.reader(inp)
writer = csv.writer(out)
for row in reader:
row[0] = row[0].capitalize()
writer.writerow(row)
inp.close()
out.close()
os.rename('b.txt', 'a.txt') # if you want to keep the same name

Remove double quotes from iterator when using csv writer

I want to create a csv from an existing csv, by splitting its rows.
Input csv:
A,R,T,11,12,13,14,15,21,22,23,24,25
Output csv:
A,R,T,11,12,13,14,15
A,R,T,21,22,23,24,25
So far my code looks like:
def update_csv(name):
#load csv file
file_ = open(name, 'rb')
#init first values
current_a = ""
current_r = ""
current_first_time = ""
file_content = csv.reader(file_)
#LOOP
for row in file_content:
current_a = row[0]
current_r = row[1]
current_first_time = row[2]
i = 2
#Write row to new csv
with open("updated_"+name, 'wb') as f:
writer = csv.writer(f)
writer.writerow((current_a,
current_r,
current_first_time,
",".join((row[x] for x in range(i+1,i+5)))
))
#do only one row, for debug purposes
return
But the row contains double quotes that I can't get rid of:
A002,R051,02-00-00,"05-21-11,00:00:00,REGULAR,003169391"
I've tried to use writer = csv.writer(f,quoting=csv.QUOTE_NONE) and got a _csv.Error: need to escape, but no escapechar set.
What is the correct approach to delete those quotes?
I think you could simplify the logic to split each row into two using something along these lines:
def update_csv(name):
with open(name, 'rb') as file_:
with open("updated_"+name, 'wb') as f:
writer = csv.writer(f)
# read one row from input csv
for row in csv.reader(file_):
# write 2 rows to new csv
writer.writerow(row[:8])
writer.writerow(row[:3] + row[8:])
writer.writerow is expecting an iterable such that it can write each item within the iterable as one item, separate by the appropriate delimiter, into the file. So:
writer.writerow([1, 2, 3])
would write "1,2,3\n" to the file.
Your call provides it with an iterable, one of whose items is a string that already contains the delimiter. It therefore needs some way to either escape the delimiter or a way to quote out that item. For example,
write.writerow([1, '2,3'])
Doesn't just give "1,2,3\n", but e.g. '1,"2,3"\n' - the string counts as one item in the output.
Therefore if you want to not have quotes in the output, you need to provide an escape character (e.g. '/') to mark the delimiters that shouldn't be counted as such (giving something like "1,2/,3\n").
However, I think what you actually want to do is include all of those elements as separate items. Don't ",".join(...) them yourself, try:
writer.writerow((current_a, current_r,
current_first_time, *row[i+2:i+5]))
to provide the relevant items from row as separate items in the tuple.

Categories