Related
I want to create a word dictionary. The dictionary looks like
words_meanings= {
"rekindle": "relight",
"pesky":"annoying",
"verge": "border",
"maneuver": "activity",
"accountability":"responsibility",
}
keys_letter=[]
for x in words_meanings:
keys_letter.append(x)
print(keys_letter)
Output: rekindle , pesky, verge, maneuver, accountability
Here rekindle , pesky, verge, maneuver, accountability they are the keys and relight, annoying, border, activity, responsibility they are the values.
Now I want to create a csv file and my code will take input from the file.
The file looks like
rekindle | pesky | verge | maneuver | accountability
relight | annoying| border| activity | responsibility
So far I use this code to load the file and read data from it.
from google.colab import files
uploaded = files.upload()
import pandas as pd
data = pd.read_csv("words.csv")
data.head()
import csv
reader = csv.DictReader(open("words.csv", 'r'))
words_meanings = []
for line in reader:
words_meanings.append(line)
print(words_meanings)
This is the output of print(words_meanings)
[OrderedDict([('\ufeffrekindle', 'relight'), ('pesky', 'annoying')])]
It looks very odd to me.
keys_letter=[]
for x in words_meanings:
keys_letter.append(x)
print(keys_letter)
Now I create an empty list and want to append only key values. But the output is [OrderedDict([('\ufeffrekindle', 'relight'), ('pesky', 'annoying')])]
I am confused. As per the first code block it only included keys but now it includes both keys and their values. How can I overcome this situation?
I would suggest that you format your csv with your key and value on the same row. Like this
rekindle,relight
pesky,annoying
verge,border
This way the following code will work.
words_meanings = {}
with open(file_name, 'r') as file:
for line in file.readlines():
key, value = line.split(",")
word_meanings[key] = value.rstrip("\n")
if you want a list of the keys:
list_of_keys = list(word_meanings.keys())
To add keys and values to the file:
def add_values(key:str, value:str, file_name:str):
with open(file_name, 'a') as file:
file.writelines(f"\n{key},{value}")
key = input("Input the key you want to save: ")
value = input(f"Input the value you want to save to {key}:")
add_values(key, value, file_name)```
You run the same block of code but you use it with different objects and this gives different results.
First you use normal dictionary (check type(words_meanings))
words_meanings = {
"rekindle": "relight",
"pesky":"annoying",
"verge": "border",
"maneuver": "activity",
"accountability":"responsibility",
}
and for-loop gives you keys from this dictionary
You could get the same with
keys_letter = list(words_meanings.keys())
or even
keys_letter = list(words_meanings)
Later you use list with single dictionary inside this list (check type(words_meanings))
words_meanings = [OrderedDict([('\ufeffrekindle', 'relight'), ('pesky', 'annoying')])]
and for-loop gives you elements from this list, not keys from dictionary which is inside this list. So you move full dictionary from one list to another.
You could get the same with
keys_letter = words_meanings.copy()
or even the same
keys_letter = list(words_meanings)
from collections import OrderedDict
words_meanings = {
"rekindle": "relight",
"pesky":"annoying",
"verge": "border",
"maneuver": "activity",
"accountability":"responsibility",
}
print(type(words_meanings))
keys_letter = []
for x in words_meanings:
keys_letter.append(x)
print(keys_letter)
#keys_letter = list(words_meanings.keys())
keys_letter = list(words_meanings)
print(keys_letter)
words_meanings = [OrderedDict([('\ufeffrekindle', 'relight'), ('pesky', 'annoying')])]
print(type(words_meanings))
keys_letter = []
for x in words_meanings:
keys_letter.append(x)
print(keys_letter)
#keys_letter = words_meanings.copy()
keys_letter = list(words_meanings)
print(keys_letter)
The default field separator for the csv module is a comma. Your CSV file uses the pipe or bar symbol |, and the fields also seem to be fixed width. So, you need to specify | as the delimiter to use when creating the CSV reader.
Also, your CSV file is encoded as Big-endian UTF-16 Unicode text (UTF-16-BE). The file contains a byte-order-mark (BOM) but Python is not stripping it off, so you will notice the string '\ufeffrekindle' contains the FEFF UTF-16-BE BOM. That can be dealt with by specifying encoding='utf16' when you open the file.
import csv
with open('words.csv', newline='', encoding='utf-16') as f:
reader = csv.DictReader(f, delimiter='|', skipinitialspace=True)
for row in reader:
print(row)
Running this on your CSV file produces this:
{'rekindle ': 'relight ', 'pesky ': 'annoying', 'verge ': 'border', 'maneuver ': 'activity ', 'accountability': 'responsibility'}
Notice that there is trailing whitespace in the key and values. skipinitialspace=True removed the leading whitespace, but there is no option to remove the trailing whitespace. That can be fixed by exporting the CSV file from Excel without specifying a field width. If that can't be done, then it can be fixed by preprocessing the file using a generator:
import csv
def preprocess_csv(f, delimiter=','):
# assumes that fields can not contain embedded new lines
for line in f:
yield delimiter.join(field.strip() for field in line.split(delimiter))
with open('words.csv', newline='', encoding='utf-16') as f:
reader = csv.DictReader(preprocess_csv(f, '|'), delimiter='|', skipinitialspace=True)
for row in reader:
print(row)
which now outputs the stripped keys and values:
{'rekindle': 'relight', 'pesky': 'annoying', 'verge': 'border', 'maneuver': 'activity', 'accountability': 'responsibility'}
As I found that no one able to help me with the answer. Finally, I post the answer here. Hope this will help other.
import csv
file_name="words.csv"
words_meanings = {}
with open(file_name, newline='', encoding='utf-8-sig') as file:
for line in file.readlines():
key, value = line.split(",")
words_meanings[key] = value.rstrip("\n")
print(words_meanings)
This is the code to transfer a csv to a dictionary. Enjoy!!!
UPDATED(I'm sorry, it's my first question)
I'm an intern and really new into coding.
In my job, I have to read a file from Azure storage and then insert this data into a database.
To do this, I'm using get_file_to_text().content and storing its value in a variable file as follows:
file = file_service.get_file_to_text('teste','','Retorno.csv').content
and then, I'm using .splitlines() like this:
formFile.append(file.splitlines())
I expected a result like this(each line of my file being a sublist):
[['2017-08-01', 'Zabbix server Sura', 'system.cpu.load[percpu,avg5]', '0.2900', '0.05217361111111111111', '0.1'], ['2017-08-01', 'Zabbix server Sura', 'system.cpu.util[,iowait]' ... ]
But I've got this(One big sublist with all the lines inside):
[['2017-08-01;Zabbix server Sura;system.cpu.load[percpu,avg5];0.2900;0.05217361111111111111;0.1', '2017-08-01;Zabbix server Sura;system.cpu.util[,iowait]; ... ']]
I also tried a .split(';'):
file2 = file.split(';')
But it returns me a list with the values only:
['2017-08-01', 'Zabbix server Sura', 'system.cpu.load[percpu,avg5]', '0.2900', '0.05217361111111111111', '0.1\n2017-08-01', 'Zabbix server Sura', 'system.cpu.util[,iowait]', ...]
What can I do toget the result I expect?
Thanks!
UPDATE (RESOLVED):
I did this an it worked fine.
data = []
azurestorage_text = file_service.get_file_to_text('teste', '',
'Retorno.csv').content
with StringIO(azurestorage_text) as file_obj:
reader = csv.reader(file_obj, delimiter=';')
header = next(reader)
for line in reader:
data.append(line)
.splitlines() will split the lines in the text input, returning a list of whole lines. In order to parse that into fields (bits between semicolons) you would need to then .split(';') each line, e.g.
lines = text.splitlines()
rows = []
for line in lines:
row.append(line.split(';'))
However if you want to split semicolon-separated text like this you should be using csv.reader to parse the data. It is more robust at handling CSV formats, including for example "quoted text". Splitting on semicolons will break if any of the fields in the data have semicolons within them, e.g. "semicolons in quoted; text".
csv.reader requires a file-like object to operate on, rather than a string. To pass in a string, you can use StringIO to create a file-like interface to it:
For Python2:
from StringIO import cStringIO as StringIO
import csv
text = file_service.get_file_to_text('teste','','Retorno.csv').content
file_obj = StringIO(text)
reader = csv.reader(file_obj, delimiter=';')
for row in reader:
print(row)
For Python3:
from io import StringIO
import csv
file_obj = StringIO(text)
text = file_service.get_file_to_text('teste','','Retorno.csv').content
file_obj = StringIO(text)
reader = csv.reader(file_obj, delimiter=';')
for row in reader:
print(row)
Each row will contain a single line from your file, split into fields on the semicolons (specified by delimiter).
Below, I am trying to replace data in a csv. The code works, but it replaces anything matching stocklevelin the file.
def updatestocklevel(quantity, stocklevel, code):
newlevel = stocklevel - quantity
stocklevel = str(stocklevel)
newlevel = str(newlevel)
s = open("stockcontrol.csv").read()
s = s.replace (stocklevel ,newlevel) #be careful - will currently replace any number in the file matching stock level!
f = open("stockcontrol.csv", 'w')
f.write(s)
f.close()
My csv looks like this;
34512340,1
12395675,2
56756777,1
90673412,2
12568673,3
22593672,5
65593691,4
98593217,2
98693214,2
98693399,5
11813651,85
98456390,8
98555567,3
98555550,45
98553655,2
96553657,1
91823656,2
99823658,2
Elsewhere in my program, I have a function that searches for the code (8 digits)
Is it possible to say, if the code is in the line of the csv, replace the data in the second column? (data[2])
All the occurances of stocklevel are getting replaced with the value of newlevel as you are calling s.replace (stocklevel ,newlevel).
string.replace(s, old, new[, maxreplace]): Return a copy of string s
with all occurrences of substring old replaced by new. If the optional
argument maxreplace is given, the first maxreplace occurrences are
replaced.
source
As you suggested, you need to get the code and use it replace the stock level.
This is a sample script which takes the 8 digit code and the new stock level as the command line arguments adn replaces it:
import sys
import re
code = sys.argv[1]
newval= int(sys.argv[2])
f=open("stockcontrol.csv")
data=f.readlines()
print data
for i,line in enumerate(data):
if re.search('%s,\d+'%code,line): # search for the line with 8-digit code
data[i] = '%s,%d\n'%(code,newval) # replace the stock value with new value in the same line
f.close()
f=open("in.csv","w")
f.write("".join(data))
print data
f.close()
Another solution using the csv module of Python:
import sys
import csv
data=[]
code = sys.argv[1]
newval= int(sys.argv[2])
f=open("stockcontrol.csv")
reader=csv.DictReader(f,fieldnames=['code','level'])
for line in reader:
if line['code'] == code:
line['level']= newval
data.append('%s,%s'%(line['code'],line['level']))
f.close()
f=open("stockcontrol.csv","w")
f.write("\n".join(data))
f.close()
Warning: Keep a back up of the input file while trying out these scripts as they overwrite the input file.
If you save the script in a file called test.py then invoke it as:
python test.py 34512340 10.
This should replace the stockvalue of code 34512340 to 10.
Why not using good old regular expressions?
import re
code, new_value = '11813651', '885' # e.g, change 85 to 885 for code 11813651
print (re.sub('(^%s,).*'%code,'\g<1>'+new_value,open('stockcontrol.csv').read()))
Since it's a csv file I'd suggest using Python's csv module. You will need to write to a new file since reading and writing to the same file will turn out bad. You can always rename it afterwards.
This example uses StringIO (Python 2) to embed your csv data in the code and treat it as a file. Normally you would open a file to get the input.
Updated
import csv
# Python 2 and 3
try:
from StringIO import StringIO
except ImportError:
from io import StringIO
CSV = """\
34512340,1
12395675,2
56756777,1
90673412,2
12568673,3
22593672,5
65593691,4
98593217,2
98693214,2
98693399,5
11813651,85
98456390,8
98555567,3
98555550,45
98553655,2
96553657,1
91823656,2
99823658,2
"""
def replace(key, value):
fr = StringIO(CSV)
with open('out.csv', 'w') as fw:
r = csv.reader(fr)
w = csv.writer(fw)
for row in r:
if row[0] == key:
row[1] = value
w.writerow(row)
replace('99823658', 42)
My data looks like below
['[\'Patient, A\', \'G\', \'P\', \'RNA\']']
Irrespective of the brackets, quotes and back slashes, I'd like to separate the data by ',' and write to a CSV file like below
Patient,A,G,P,RNA
Mentioning delimiter = ',' has done no help. The output file then looks like
['Patient, A','G','P','RNA']
all in a single cell. I want to split them into multiple columns. How can I do that?
Edit - Mentioning quotechar='|' split them into different cells but it now looks like
|['Patient, A','G','P','RNA']|
Edit-
out_file_handle = csv.writer(out_file, quotechar='|', lineterminator='\n', delimiter = ",")
data = ''.join(mydict.get(word.lower(), word) for word in re.split('(\W+)', transposed))
data = [data,]
out_file_handle.writerow(data)
transposed:
['Patient, A','G','P','RNA']
data:
['[\'Patient, A\', \'G\', \'P\', \'RNA\']']
And it has multiple rows, the above is one of the rows from the entire data.
You first need to read this data into a Python array, by processing the string as a CSV file in memory:
from StringIO import StringIO
import csv
data = ['[\'Patient, A\', \'G\', \'P\', \'RNA\']']
clean_data = list(csv.reader( StringIO(data[0]) ))
However the output is still a single string, because it's not even a well-formed CSV! In which case, the best thing might be to filter out all those junk characters?
import re
clean_data = re.sub("[\[\]']","",data[0])
Now data[0] is 'Patient, A, G, P, RNA' which is a clean CSV you can write straight to a file.
If what you're trying to do is write data in the form of ['[\'Patient, A\', \'G\', \'P\', \'RNA\']'], where you have an array of these strings, to file, then it's really a question in two parts.
The first, is how do you separate the data into the correct format, and then the second is is to write it to file.
If that is the form of your data, for every row, then something like this should work (to get it into the correct format):
data = ['[\'Patient, A\', \'G\', \'P\', \'RNA\']', ...]
newData = [entry.replace("\'", "")[1:-1].split(",") for entry in data]
that will give you data in the following form:
[["Patient", "A", "G", "P", "RNA"], ...]
and then you can write it to file as suggested in the other answers;
with open('new.csv', 'wb') as write_file:
file_writer = csv.writer(write_file)
for dataEntry in range(newData ):
file_writer.writerow(dataEntry)
If you don't actually care about using the data in this round, and just want to clean it up, then you can just do data.replace("\'", "")[1:-1] and then write those strings to file.
The [1:-1] bits are just to remove the leading and trailing square brackets.
Python has a CSV writer. Start off with
import csv
Then try something like this
with open('new.csv', 'wb') as write_file:
file_writer = csv.writer(write_file)
for i in range(data):
file_writer.writerow([x for x in data[i]])
Edit:
You might have to wrangle the data a bit first before writing it, since it looks like its a string and not actually a list. Try playing around with the split() function
list = data.split()
"""
SAVING DATA INTO CSV FORMAT
* This format is used for many purposes, mainly for deep learning.
* This type of file can be used to view data in MS Excel or any similar
Application
"""
# == Imports ===================================================================
import csv
import sys
# == Initialisation Function ===================================================
def initialise_csvlog(filename, fields):
"""
Initilisation this function before using the Inserction function
* This Function checks the data before adding new one in order to maintain
perfect mechanisum of insertion
* It check the file if not exists then it creates a new one
* if it exists then it proceeds with getting fields
Parameters
----------
filename : String
Filename along with directory which need to be created
Fields : List
Colomns That need to be initialised
"""
try :
with open(filename,'r') as csvfile:
csvreader = csv.reader(csvfile)
fields = csvreader.next()
print("Data Already Exists")
sys.exit("Please Create a new empty file")
# print fields
except :
with open(filename,'w') as csvfile:
csvwriter = csv.writer(csvfile)
csvwriter.writerow(fields)
# == Data Insertion Function ===================================================
def write_data_csv(filename, row_data):
"""
This Function save the Row Data into the CSV Created
* This adds the row data that is Double Listed
Parameters
----------
filename : String
Filename along with directory which need to be created
row_data : List
Double Listed consisting of row data and column elements in a list
"""
with open(filename,'a') as csvfile:
csvwriter = csv.writer(csvfile)
csvwriter.writerows(row_data)
if __name__ == '__main__':
"""
This function is used to test the Feature Run it independently
NOTE: DATA IN row_data MUST BE IN THE FOLLOWING DOUBLE LISTED AS SHOWN
"""
filename = "TestCSV.csv"
fields = ["sno","Name","Work","Department"]
#Init
initialise_csvlog(filename,fields)
#Add Data
row_data = [["1","Jhon","Coder","Pythonic"]]
write_data_csv(filename,row_data)
# == END =======================================================================
Read the Module and you can start using CSV and view data in Excel or any similar application (calc in libreoffice)
NOTE: Remember to place list of data to be double listed as shown in __main__ function (row_data)
def export_csv(request):
response = HttpResponse(mimetype = 'text/csv')
response['Content-Disposition'] = 'attachment; filename=exported.csv'
writer = csv.writer(response)
data = [['First-Name','Last-name'],['Foo','baar']]
for entry in data:
writer.writerow(entry)
return response
Above code is written for outputting csv file from django. Problem I am facing is the exported/outputted file's content are not displaying in different boxes in csv file editor(s). For each entry in data (list) it is getting printed in same box rather it should interpret each entry's value(First-Name, Last-Name) in different box.
Actual result in csv file:
|First-Name,Last-Name |
|Foor,bar |
Expected:
|First-Name |Last-Name |
|Foo |Baar |
How can I get mechanism by which it will export the file independent on the list size and its contents?
You could set your own dialect for csv.writer with your own custom delimiters etc :
class dia2(csv.Dialect):
delimiter = '\t' # delimiter should be only 1-char
quotechar = '"'
escapechar = None
doublequote = True
skipinitialspace = False
lineterminator = '\r\n'
quoting = 1
writer = csv.writer(response, dialect=dia2)
...
The main cause of the problem with different editors different behaviour - that you should correctly set CSV-delimiter to one you really used - in each editor itself. One more possible reason - is that you don't use quoting for each CSV-value (that's quoting=1 in dialect class).
One more thing you could expect, according to your expected output- the same length for each field, filling spaces for the rest - that couldn't be done through csv module, but you could format your data to the same length for each cell using standard python formats like:
data[i] = "%10s" % data[i]