Same python code block gives different outputs at different time - python

I want to create a word dictionary. The dictionary looks like
words_meanings= {
"rekindle": "relight",
"pesky":"annoying",
"verge": "border",
"maneuver": "activity",
"accountability":"responsibility",
}
keys_letter=[]
for x in words_meanings:
keys_letter.append(x)
print(keys_letter)
Output: rekindle , pesky, verge, maneuver, accountability
Here rekindle , pesky, verge, maneuver, accountability they are the keys and relight, annoying, border, activity, responsibility they are the values.
Now I want to create a csv file and my code will take input from the file.
The file looks like
rekindle | pesky | verge | maneuver | accountability
relight | annoying| border| activity | responsibility
So far I use this code to load the file and read data from it.
from google.colab import files
uploaded = files.upload()
import pandas as pd
data = pd.read_csv("words.csv")
data.head()
import csv
reader = csv.DictReader(open("words.csv", 'r'))
words_meanings = []
for line in reader:
words_meanings.append(line)
print(words_meanings)
This is the output of print(words_meanings)
[OrderedDict([('\ufeffrekindle', 'relight'), ('pesky', 'annoying')])]
It looks very odd to me.
keys_letter=[]
for x in words_meanings:
keys_letter.append(x)
print(keys_letter)
Now I create an empty list and want to append only key values. But the output is [OrderedDict([('\ufeffrekindle', 'relight'), ('pesky', 'annoying')])]
I am confused. As per the first code block it only included keys but now it includes both keys and their values. How can I overcome this situation?

I would suggest that you format your csv with your key and value on the same row. Like this
rekindle,relight
pesky,annoying
verge,border
This way the following code will work.
words_meanings = {}
with open(file_name, 'r') as file:
for line in file.readlines():
key, value = line.split(",")
word_meanings[key] = value.rstrip("\n")
if you want a list of the keys:
list_of_keys = list(word_meanings.keys())
To add keys and values to the file:
def add_values(key:str, value:str, file_name:str):
with open(file_name, 'a') as file:
file.writelines(f"\n{key},{value}")
key = input("Input the key you want to save: ")
value = input(f"Input the value you want to save to {key}:")
add_values(key, value, file_name)```

You run the same block of code but you use it with different objects and this gives different results.
First you use normal dictionary (check type(words_meanings))
words_meanings = {
"rekindle": "relight",
"pesky":"annoying",
"verge": "border",
"maneuver": "activity",
"accountability":"responsibility",
}
and for-loop gives you keys from this dictionary
You could get the same with
keys_letter = list(words_meanings.keys())
or even
keys_letter = list(words_meanings)
Later you use list with single dictionary inside this list (check type(words_meanings))
words_meanings = [OrderedDict([('\ufeffrekindle', 'relight'), ('pesky', 'annoying')])]
and for-loop gives you elements from this list, not keys from dictionary which is inside this list. So you move full dictionary from one list to another.
You could get the same with
keys_letter = words_meanings.copy()
or even the same
keys_letter = list(words_meanings)
from collections import OrderedDict
words_meanings = {
"rekindle": "relight",
"pesky":"annoying",
"verge": "border",
"maneuver": "activity",
"accountability":"responsibility",
}
print(type(words_meanings))
keys_letter = []
for x in words_meanings:
keys_letter.append(x)
print(keys_letter)
#keys_letter = list(words_meanings.keys())
keys_letter = list(words_meanings)
print(keys_letter)
words_meanings = [OrderedDict([('\ufeffrekindle', 'relight'), ('pesky', 'annoying')])]
print(type(words_meanings))
keys_letter = []
for x in words_meanings:
keys_letter.append(x)
print(keys_letter)
#keys_letter = words_meanings.copy()
keys_letter = list(words_meanings)
print(keys_letter)

The default field separator for the csv module is a comma. Your CSV file uses the pipe or bar symbol |, and the fields also seem to be fixed width. So, you need to specify | as the delimiter to use when creating the CSV reader.
Also, your CSV file is encoded as Big-endian UTF-16 Unicode text (UTF-16-BE). The file contains a byte-order-mark (BOM) but Python is not stripping it off, so you will notice the string '\ufeffrekindle' contains the FEFF UTF-16-BE BOM. That can be dealt with by specifying encoding='utf16' when you open the file.
import csv
with open('words.csv', newline='', encoding='utf-16') as f:
reader = csv.DictReader(f, delimiter='|', skipinitialspace=True)
for row in reader:
print(row)
Running this on your CSV file produces this:
{'rekindle ': 'relight ', 'pesky ': 'annoying', 'verge ': 'border', 'maneuver ': 'activity ', 'accountability': 'responsibility'}
Notice that there is trailing whitespace in the key and values. skipinitialspace=True removed the leading whitespace, but there is no option to remove the trailing whitespace. That can be fixed by exporting the CSV file from Excel without specifying a field width. If that can't be done, then it can be fixed by preprocessing the file using a generator:
import csv
def preprocess_csv(f, delimiter=','):
# assumes that fields can not contain embedded new lines
for line in f:
yield delimiter.join(field.strip() for field in line.split(delimiter))
with open('words.csv', newline='', encoding='utf-16') as f:
reader = csv.DictReader(preprocess_csv(f, '|'), delimiter='|', skipinitialspace=True)
for row in reader:
print(row)
which now outputs the stripped keys and values:
{'rekindle': 'relight', 'pesky': 'annoying', 'verge': 'border', 'maneuver': 'activity', 'accountability': 'responsibility'}

As I found that no one able to help me with the answer. Finally, I post the answer here. Hope this will help other.
import csv
file_name="words.csv"
words_meanings = {}
with open(file_name, newline='', encoding='utf-8-sig') as file:
for line in file.readlines():
key, value = line.split(",")
words_meanings[key] = value.rstrip("\n")
print(words_meanings)
This is the code to transfer a csv to a dictionary. Enjoy!!!

Related

'open' and 'csv.DictReader' cannot read file correctly

I have a simple .csv file encoded in windows-1250. Two columns with key-value pairs, separated by semicolon. I would like to create a dictionary from this data. I used this solution: How to read a .csv file into a dictionary in Python. Code below:
import os
import csv
strpath = r"C:\Project Folder"
filename = "to_dictionary.csv"
os.chdir(strpath)
test_csv = open(filename, mode="r", encoding="windows-1250")
dctreader = csv.DictReader(test_csv)
ordereddct = list(dctreader)[0]
finaldct = dict(ordereddct)
print(finaldct)
First of all, this file has 370 rows but I receive only two. Second, Python reads whole first row as a key and next row as a value (and then stops as I mentioned).
# source data
# a;A
# b;B
# c;C
# ... up to 370 rows
# what I need (example; there should be 368 pairs more of course)
finaldct = {"a": "A", "b": "B"}
# what I receive
finaldct = {"a;A": "b;B"}
I have no idea why this happens and couldn't find any working solution.
Note: I would like to avoid using pandas because it seems to work slower in this case.
file has 370 rows but I receive only two
This might be caused by problems with newlines (they do differ between systems, see Newline wikipedia entry if you want to know more). csv module docs suggest using newline='' i.e. in your case
test_csv = open(filename, newline='', mode="r", encoding="windows-1250")
If you have a file with just two columns (assuming they're unquoted, etc.), you don't need use the csv module at all.
dct = {}
with open("file.txt", encoding="windows-1250") as f:
for line in f:
key, _, value = line.rstrip("\r\n").partition(";")
dct[key] = value
Thank you all but I finally managed to do so (and without looping)! The solution from Kite misleaded me a little. Here is my code:
import os
import csv
strpath = r"C:\Project Folder"
filename = "to_dictionary.csv"
os.chdir(strpath)
test_csv = open(filename, mode="r", encoding="windows-1250")
csvreader = csv.reader(test_csv, delimiter=";")
finaldct = dict(csvreader)
print(finaldct)
So, I needed to specify delimiter first but in a reader. Second, there's no need to use DictReader. Changing reader to dictionary suffices.
You can try the below
data = dict()
with open('test.csv') as f:
for line in f:
temp = line.strip()
if temp:
k,v = temp.split(';')
data[k] = v
print(data)
test.csv
1;2
3;5
78;8
6;0
output
{'1': '2', '3': '5', '78': '8', '6': '0'}

Looping through a dictionary to replace multiple values in text file

I'm trying to change several hex values in a text file. I made a CSV that has the original values in one column and the new values in another.
My goal is to write a simple Python script to find old values in the text file based on the first column and replace them with new values in the second.
I'm attempting to use a dictionary to facilitate this replace() that I created by looping through the CSV. Building it was pretty easy, but using it to executing a replace() hasn't been working out. When I print out the values after my script runs I'm still seeing the original ones.
I've tried reading in the text file using read() and executing the change to the whole file like above.
import csv
filename = "origin.txt"
csv_file = 'replacements.csv'
conversion_dict = {}
# Create conversion dictionary
with open(csv_file, "r") as replace:
reader = csv.reader(replace, delimiter=',')
for rows in reader:
conversion_dict.update({rows[0]:rows[1]})
#Replace values on text files based on conversion dict
with open(filename, "r") as fileobject:
txt = str(fileobject.read())
for keys, values, in conversion_dict.items():
new_text = txt.replace(keys, values)
I've also tried adding the updated text to a list:
#Replace values on text files based on conversion dict
with open(filename, "r") as fileobject:
txt = str(fileobject.read())
for keys, values, in conversion_dict.items():
new_text.append(txt.replace(keys, values))
Then, I tried using readlines() to replace the old values with new ones one line at a time:
# Replace values on text files based on conversion dict
with open(filename, "r") as reader:
reader.readlines()
type(reader)
for line in reader:
print(line)
for keys, values, in conversion_dict.items():
new_text.append(txt.replace(keys, values))
While troubleshooting, I ran a test to see if I was getting any matches between the keys in my dict and the text in the file:
for keys, values, in conversion_dict.items():
if keys in txt:
print("match")
else:
print("no match")
My output returned match on every row except the first one. I imagine with some trimming or something I could fix that. However, this proves that there are matches, so there must be some other issue with my code.
Any help is appreciated.
origin.txt:
oldVal9000,oldVal1,oldVal2,oldVal3,oldVal69
test.csv:
oldVal1,newVal1
oldVal2,newVal2
oldVal3,newVal3
oldVal4,newVal4
import csv
filename = "origin.txt"
csv_file = 'test.csv'
conversion_dict = {}
with open(csv_file, "r") as replace:
reader = csv.reader(replace, delimiter=',')
for rows in reader:
conversion_dict.update({rows[0]:rows[1]})
f = open(filename,'r')
txt = str(f.read())
f.close()
txt= txt.split(',') #not sure what your origin.txt actually looks like, assuming comma seperated values
for i in range(len(txt)):
if txt[i] in conversion_dict:
txt[i] = conversion_dict[txt[i]]
with open(filename, "w") as outfile:
outfile.write(",".join(txt))
modified origin.txt:
oldVal9000,newVal4,newVal1,newVal3,oldVal69

Write data to a csv file Python

My data looks like below
['[\'Patient, A\', \'G\', \'P\', \'RNA\']']
Irrespective of the brackets, quotes and back slashes, I'd like to separate the data by ',' and write to a CSV file like below
Patient,A,G,P,RNA
Mentioning delimiter = ',' has done no help. The output file then looks like
['Patient, A','G','P','RNA']
all in a single cell. I want to split them into multiple columns. How can I do that?
Edit - Mentioning quotechar='|' split them into different cells but it now looks like
|['Patient, A','G','P','RNA']|
Edit-
out_file_handle = csv.writer(out_file, quotechar='|', lineterminator='\n', delimiter = ",")
data = ''.join(mydict.get(word.lower(), word) for word in re.split('(\W+)', transposed))
data = [data,]
out_file_handle.writerow(data)
transposed:
['Patient, A','G','P','RNA']
data:
['[\'Patient, A\', \'G\', \'P\', \'RNA\']']
And it has multiple rows, the above is one of the rows from the entire data.
You first need to read this data into a Python array, by processing the string as a CSV file in memory:
from StringIO import StringIO
import csv
data = ['[\'Patient, A\', \'G\', \'P\', \'RNA\']']
clean_data = list(csv.reader( StringIO(data[0]) ))
However the output is still a single string, because it's not even a well-formed CSV! In which case, the best thing might be to filter out all those junk characters?
import re
clean_data = re.sub("[\[\]']","",data[0])
Now data[0] is 'Patient, A, G, P, RNA' which is a clean CSV you can write straight to a file.
If what you're trying to do is write data in the form of ['[\'Patient, A\', \'G\', \'P\', \'RNA\']'], where you have an array of these strings, to file, then it's really a question in two parts.
The first, is how do you separate the data into the correct format, and then the second is is to write it to file.
If that is the form of your data, for every row, then something like this should work (to get it into the correct format):
data = ['[\'Patient, A\', \'G\', \'P\', \'RNA\']', ...]
newData = [entry.replace("\'", "")[1:-1].split(",") for entry in data]
that will give you data in the following form:
[["Patient", "A", "G", "P", "RNA"], ...]
and then you can write it to file as suggested in the other answers;
with open('new.csv', 'wb') as write_file:
file_writer = csv.writer(write_file)
for dataEntry in range(newData ):
file_writer.writerow(dataEntry)
If you don't actually care about using the data in this round, and just want to clean it up, then you can just do data.replace("\'", "")[1:-1] and then write those strings to file.
The [1:-1] bits are just to remove the leading and trailing square brackets.
Python has a CSV writer. Start off with
import csv
Then try something like this
with open('new.csv', 'wb') as write_file:
file_writer = csv.writer(write_file)
for i in range(data):
file_writer.writerow([x for x in data[i]])
Edit:
You might have to wrangle the data a bit first before writing it, since it looks like its a string and not actually a list. Try playing around with the split() function
list = data.split()
"""
SAVING DATA INTO CSV FORMAT
* This format is used for many purposes, mainly for deep learning.
* This type of file can be used to view data in MS Excel or any similar
Application
"""
# == Imports ===================================================================
import csv
import sys
# == Initialisation Function ===================================================
def initialise_csvlog(filename, fields):
"""
Initilisation this function before using the Inserction function
* This Function checks the data before adding new one in order to maintain
perfect mechanisum of insertion
* It check the file if not exists then it creates a new one
* if it exists then it proceeds with getting fields
Parameters
----------
filename : String
Filename along with directory which need to be created
Fields : List
Colomns That need to be initialised
"""
try :
with open(filename,'r') as csvfile:
csvreader = csv.reader(csvfile)
fields = csvreader.next()
print("Data Already Exists")
sys.exit("Please Create a new empty file")
# print fields
except :
with open(filename,'w') as csvfile:
csvwriter = csv.writer(csvfile)
csvwriter.writerow(fields)
# == Data Insertion Function ===================================================
def write_data_csv(filename, row_data):
"""
This Function save the Row Data into the CSV Created
* This adds the row data that is Double Listed
Parameters
----------
filename : String
Filename along with directory which need to be created
row_data : List
Double Listed consisting of row data and column elements in a list
"""
with open(filename,'a') as csvfile:
csvwriter = csv.writer(csvfile)
csvwriter.writerows(row_data)
if __name__ == '__main__':
"""
This function is used to test the Feature Run it independently
NOTE: DATA IN row_data MUST BE IN THE FOLLOWING DOUBLE LISTED AS SHOWN
"""
filename = "TestCSV.csv"
fields = ["sno","Name","Work","Department"]
#Init
initialise_csvlog(filename,fields)
#Add Data
row_data = [["1","Jhon","Coder","Pythonic"]]
write_data_csv(filename,row_data)
# == END =======================================================================
Read the Module and you can start using CSV and view data in Excel or any similar application (calc in libreoffice)
NOTE: Remember to place list of data to be double listed as shown in __main__ function (row_data)

Remove double quotes from iterator when using csv writer

I want to create a csv from an existing csv, by splitting its rows.
Input csv:
A,R,T,11,12,13,14,15,21,22,23,24,25
Output csv:
A,R,T,11,12,13,14,15
A,R,T,21,22,23,24,25
So far my code looks like:
def update_csv(name):
#load csv file
file_ = open(name, 'rb')
#init first values
current_a = ""
current_r = ""
current_first_time = ""
file_content = csv.reader(file_)
#LOOP
for row in file_content:
current_a = row[0]
current_r = row[1]
current_first_time = row[2]
i = 2
#Write row to new csv
with open("updated_"+name, 'wb') as f:
writer = csv.writer(f)
writer.writerow((current_a,
current_r,
current_first_time,
",".join((row[x] for x in range(i+1,i+5)))
))
#do only one row, for debug purposes
return
But the row contains double quotes that I can't get rid of:
A002,R051,02-00-00,"05-21-11,00:00:00,REGULAR,003169391"
I've tried to use writer = csv.writer(f,quoting=csv.QUOTE_NONE) and got a _csv.Error: need to escape, but no escapechar set.
What is the correct approach to delete those quotes?
I think you could simplify the logic to split each row into two using something along these lines:
def update_csv(name):
with open(name, 'rb') as file_:
with open("updated_"+name, 'wb') as f:
writer = csv.writer(f)
# read one row from input csv
for row in csv.reader(file_):
# write 2 rows to new csv
writer.writerow(row[:8])
writer.writerow(row[:3] + row[8:])
writer.writerow is expecting an iterable such that it can write each item within the iterable as one item, separate by the appropriate delimiter, into the file. So:
writer.writerow([1, 2, 3])
would write "1,2,3\n" to the file.
Your call provides it with an iterable, one of whose items is a string that already contains the delimiter. It therefore needs some way to either escape the delimiter or a way to quote out that item. For example,
write.writerow([1, '2,3'])
Doesn't just give "1,2,3\n", but e.g. '1,"2,3"\n' - the string counts as one item in the output.
Therefore if you want to not have quotes in the output, you need to provide an escape character (e.g. '/') to mark the delimiters that shouldn't be counted as such (giving something like "1,2/,3\n").
However, I think what you actually want to do is include all of those elements as separate items. Don't ",".join(...) them yourself, try:
writer.writerow((current_a, current_r,
current_first_time, *row[i+2:i+5]))
to provide the relevant items from row as separate items in the tuple.

Write key to separate csv based on value in dictionary

[Using Python3] I have a csv file that has two columns (an email address and a country code; script is made to actually make it two columns if not the case in the original file - kind of) that I want to split out by the value in the second column and output in separate csv files.
eppetj#desrfpkwpwmhdc.com us ==> output-us.csv
uheuyvhy#zyetccm.com de ==> output-de.csv
avpxhbdt#reywimmujbwm.com es ==> output-es.csv
gqcottyqmy#romeajpui.com it ==> output-it.csv
qscar#tpcptkfuaiod.com fr ==> output-fr.csv
qshxvlngi#oxnzjbdpvlwaem.com gb ==> output-gb.csv
vztybzbxqq#gahvg.com us ==> output-us.csv
... ... ...
Currently my code kind of does this, but instead of writing each email address to the csv it overwrites the email placed before that. Can someone help me out with this?
I am very new to programming and Python and I might not have written the code in the most pythonic way, so I would really appreciate any feedback on the code in general!
Thanks in advance!
Code:
import csv
def tsv_to_dict(filename):
"""Creates a reader of a specified .tsv file."""
with open(filename, 'r') as f:
reader = csv.reader(f, delimiter='\t') # '\t' implies tab
email_list = []
# Checks each list in the reader list and removes empty elements
for lst in reader:
email_list.append([elem for elem in lst if elem != '']) # List comprehension
# Stores the list of lists as a dict
email_dict = dict(email_list)
return email_dict
def count_keys(dictionary):
"""Counts the number of entries in a dictionary."""
return len(dictionary.keys())
def clean_dict(dictionary):
"""Removes all whitespace in keys from specified dictionary."""
return { k.strip():v for k,v in dictionary.items() } # Dictionary comprehension
def split_emails(dictionary):
"""Splits out all email addresses from dictionary into output csv files by country code."""
# Creating a list of unique country codes
cc_list = []
for v in dictionary.values():
if not v in cc_list:
cc_list.append(v)
# Writing the email addresses to a csv based on the cc (value) in dictionary
for key, value in dictionary.items():
for c in cc_list:
if c == value:
with open('output-' +str(c) +'.csv', 'w') as f_out:
writer = csv.writer(f_out, lineterminator='\r\n')
writer.writerow([key])
You can simplify this a lot by using a defaultdict:
import csv
from collections import defaultdict
emails = defaultdict(list)
with open('email.tsv','r') as f:
reader = csv.reader(f, delimiter='\t')
for row in reader:
if row:
if '#' in row[0]:
emails[row[1].strip()].append(row[0].strip()+'\n')
for key,values in emails.items():
with open('output-{}.csv'.format(key), 'w') as f:
f.writelines(values)
As your separated files are not comma separated, but single columns - you don't need the csv module and can simply write the rows.
The emails dictionary contains a key for each country code, and a list for all the matching email addresses. To make sure the email addresses are printed correctly, we remove any whitespace and add the a line break (this is so we can use writelines later).
Once the dictionary is populated, its simply a matter of stepping through the keys to create the files and then writing out the resulting list.
The problem with your code is that it keeps opening the same country output file each time it writes an entry into it, thereby overwriting whatever might have already been there.
A simple way to avoid that is to open all the output files at once for writing and store them in a dictionary keyed by the country code. Likewise, you can have another that associates each country code to acsv.writerobject for that country's output file.
Update: While I agree that Burhan's approach is probably superior, I feel that you have the idea that my earlier answer was excessively long due to all the comments it had -- so here's another version of essentially the same logic but with minimal comments to allow you better discern its reasonably-short true length (even with the contextmanager).
import csv
from contextlib import contextmanager
#contextmanager # to manage simultaneous opening and closing of output files
def open_country_csv_files(countries):
csv_files = {country: open('output-'+country+'.csv', 'w')
for country in countries}
yield csv_files
for f in csv_files.values(): f.close()
with open('email.tsv', 'r') as f:
email_dict = {row[0]: row[1] for row in csv.reader(f, delimiter='\t') if row}
countries = set(email_dict.values())
with open_country_csv_files(countries) as csv_files:
csv_writers = {country: csv.writer(csv_files[country], lineterminator='\r\n')
for country in countries}
for email_addr,country in email_dict.items():
csv_writers[country].writerow([email_addr])
Not a Python answer, but maybe you can use this Bash solution.
$ while read email country
do
echo $email >> output-$country.csv
done < in.csv
This reads the lines from in.csv, splits them into two parts email and country, and appends (>>) the email to the file called output-$country.csv.

Categories