Converting a dictionary with nested lists into a CSV file - python

Been having a hard time trying to convert this dictionary with nested lists into a CSV file. I have a CSV file I am filtering - https://easyupload.io/8zobej. I turned it into a dictionary then cleaned it up. I am now stuck on trying to output it to a CSV and I don't know what to do. I've tried many different combinations of DictWriter and writerows but I keep coming up short. I am now trying to come up with a for loop that would go through the dictionary and output the value it finds to the CSV.
Here is my code - please excuse the comments - I was trying many things.
def dataSorter(filename:str):
"""
The defined function scans the inputted CSV file with 2 columns (Category, value) and sorts the values into categories.
Giving us lists of values for each category
Done by
"""
#Open the input csv file and parse them by comma delimited
with open(filename) as inputcsv:
readcsv = csv.reader(inputcsv, delimiter = ',')
sortedData = {}
#skips first row
next(readcsv)
#loops through file and assigns values to the key in dictionary "sortedData"
for i in readcsv:
category = i[0]
if category not in sortedData:
sortedData[category] = [i[1]]
else:
if i[1] not in sortedData[category]:
sortedData[category].append(i[1])
sortedData[category].sort()
for category in sortedData.keys():
sortedData[category].sort()

Related

How to sort a dictionary list with multiple values related to one single key name and convert it to float numbers

I'm a beginner at Python and I'm trying to figure out a paper I have to do for school. So the situation is that I have a csvfile under DictReader and want to be able to not only sort all the values related to one key in the dictionary, which is 'CRIM', in ascending order, but also to convert all the data values in float so I can more easily manipulate the information after that
The CSV file data looks like this: {"CRIM": "0,00522"} {"CRIM": "0,06552"} {"CRIM": "0,01903"} {"CRIM": "0,01263"}
I started doing this:
import csv
with open('data. Question 1.csv', newline='') as csvfile:
data = csv.DictReader(csvfile)
for line in data:
list_values = (line['CRIM'])
This makes all the values related to the Key 'CRIM' appear but don't know how to proceed from there.
Can somebody help me?
If you just need the a list of all of the float "CRIM" values, you can try this. This assumes that "CRIM" appears in every line, and each float value follows the same format as your sample data.
import csv
list_values = []
with open('data. Question 1.csv', newline='') as csvfile:
data = csv.DictReader(csvfile)
for line in data:
list_values.append(float(line['CRIM'].replace(',','.')))
list_values.sort()

Read csv files in a loop, isolating the non-empty elements from a specific column each time-Python

I am new to Python and I attempt to read .csv files sequentially in a for or while loop (I have about 250 .csv). The aim is to read each .csv file and isolate only all the columns, whenever a specific column (let's call it "wanted_column") is non-empty (i.e. its non-empty rows). Then, save all the non-empty rows of the "wanted_column", as well as all its columns, in a new .csv file.
Hence, at the end, I want to have 250 .csv files with all columns for each row that has non-empty elements in the "wanted_column".
Hopefully this is clear. I would appreciate any ideas.
George
I wrote this code below just to give you an idea of how to do it. Beware that this code below does not check for any errors. Its behavior is undefined if one of your CSV files is empty, if it couldn't find the file, and if the column you defined is a none existence column in one of the file. There could be more. Thus, you would want to build a check code around it. Also, your CSV formatting could greatly be depended on python csv package.
So now to the code explanation. For the "paths" variable. You can give that a string, a tuple, or a list. If you give it a string, it will convert that to a tuple with one index. You can give that variable the file(s) that you want to work with.
For the "column" variable, that one should be a string. You need to build an error checking for that if needed.
For code routines, the function will read all the CSV files of the paths list. Each time it read a file, it will read the first line first and save the content to a variable(rowFields).
After that, it generates the dict header(fields) with key(column name) to value(position). That dict is used to search for the column position by using its name. For here, you could also go through each field and if the field matches the column name then you save that value as the column position. Then that position could later be used instead of keep on searching the dict for the position using the name. The later method described in this paragraph should be the fastest.
After that, it goes on and read each row of the CSV file until the end. Each time it read a row, it will check if the length of the string from the column that defined by the "column" variable is larger than zero. If that string length is larger than zero, then it will append that row to the variable contentRows.
After the function done reading the CSV file, it will write the contents of variable "rowFields" and "contentRows" to a CSV file that defined by the "outfile" variable. To make it easy for me, outfile simply equal to input file + ".new". You can just change that.
import csv
def getNoneEmpty( paths, column ):
if isinstance(paths, str):
paths = (paths, )
if not isinstance(paths, list) and not isinstance(paths, tuple):
raise("paths have to be a or string, list, or tuple")
quotechar='"'
delimiter=","
lineterminator="\n"
for f in paths:
outfile = f + ".new" # change this area to how you want to generate the new file
fields = {}
rowFields = None
contentRows = []
with open(f, newline='') as csvfile:
csvReader = csv.reader(csvfile, delimiter=delimiter, quotechar=quotechar, lineterminator=lineterminator)
rowFields = next(csvReader)
for i in range(0, len(rowFields)):
fields[rowFields[i]] = i
for row in csvReader:
if len(row[fields[column]]) != 0:
contentRows.append(row)
with open(outfile, 'w') as csvfile:
csvWriter = csv.writer(csvfile,delimiter=delimiter, quotechar=quotechar,quoting=csv.QUOTE_MINIMAL, lineterminator=lineterminator)
csvWriter.writerow(rowFields)
csvWriter.writerows(contentRows)
getNoneEmpty(["test.csv","test2.csv"], "1958")
test.csv content:
"Month","1958","1959","1960"
"JAN",115,360,417
"FEB",318,342,391
"MAR",362,406,419
"APR",348,396,461
"MAY",11,420,472
"JUN",124,472,535
"JUL",158,548,622
"AUG",505,559,606
"SEP",404,463,508
"OCT",,407,461
"NOV",310,362,390
"DEC",110,405,432
test2.csv content:
"Month","1958","1959","1960"
"JAN",,360,417
"FEB",318,342,391
"MAR",362,406,419
"APR",348,396,461
"MAY",,420,472
"JUN",,472,535
"JUL",,548,622
"AUG",505,559,606
"SEP",404,463,508
"OCT",,407,461
"NOV",310,362,390
"DEC",110,405,432
Hopefully it will work:
def main():
temp = []
with open(r'old_csv') as csv_file:
csv_reader = csv.reader(csv_file, delimiter=';')
for row in csv_reader:
for x in row:
temp.append(x)
with open(r'new_csv', mode='w') as new_file:
writer = csv.writer(new_file, delimiter=',', lineterminator='\n')
for col in temp:
list_ = col.split(',')
writer.writerow(list_)

Python code to process CSV file

I am getting the CSV file updated on daily basis. Need to process and create new file based on the criteria - If New data then should be tagged as New against the row and if its an update to the existing data then should be tagged as Update. How to write a Python code to process and output in CSV file as follows based on the date.
Day1 input data
empid,enmname,sal,datekey
1,cholan,100,8/14/2018
2,ram,200,8/14/2018
Day2 input Data
empid,enmname,sal,datekey
1,cholan,100,8/14/2018
2,ram,200,8/14/2018
3,sundar,300,8/15/2018
2,raman,200,8/15/2018
Output Data
status,empid,enmname,sal,datekey
new,3,sundar,300,8/15/2018
update,2,raman,200,8/15/2018
I'm feeling nice, so I'll give you some code. Try to learn from it.
To work with CSV files, we'll need the csv module:
import csv
First off, let's teach the computer how to open and parse a CSV file:
def parse(path):
with open(path) as f:
return list(csv.DictReader(f))
csv.DictReader reads the first line of the csv file and uses it as the "names" of the columns. It then creates a dictionary for each subsequent row, where the keys are the column names.
That's all well and good, but we just want the last version with each key:
def parse(path):
data = {}
with open(path) as f:
for row in csv.DictReader(f):
data[row["empid"]] = row
return data
Instead of just creating a list containing everything, this creates a dictionary where the keys are the row's id. This way, rows found later in the file will overwrite rows found earlier in the file.
Now that we've taught the computer how to extract the data from the files, let's get it:
old_data = parse("file1.csv")
new_data = parse("file2.csv")
Iterating through a dictionary gives you its keys, which are the ids defined in the data set. For consistency, key in dictionary says whether key is one of the keys in the dictionary. So we can do this:
new = {
id_: row
for id_, row in new_data.items()
if id_ not in old_data
}
updated = {
id_: row
for id_, row in new_data.items()
if id_ in old_data and old_data[id_] != row
}
I'll put csv.DictWriter here and let you sort out the rest on your own.

Filtering lists with elements of another list - Python

so I have two files, one containing barcodes and the other containing what I want to search.
File 1 is in the format:
BC01 123
BC02 124
BC03 125
my second file is in the format:
INV01 123axxxx
INV02 123bxxxx
INV03 124cxxxx
INV04 125dxxxx
Both files are tab delimited between the "tag" and the rest of the line.
So what Im currently trying to do is to search the second file with the barcodes found in the first and output them to separate files.
So the end result that I want are 3 separate files BC01, BC02, BC03 with the corresponding inventory numbers with the barcode cut off, for example:
file BC01 would read:
INV01
axxxx
INV02
bxxxx
What I have right now are lists of the separate tab delimited portions of both files: BCID, BCnumber, INVID, and INVnumber and I'm not quite sure how to proceed from here.
Create a dictionary out of file_1:
barcodes = {}
with open(file_1) as file_one:
csv_reader = csv.reader(file_one, delimiter='\t')
for row in csv_reader:
barcodes[row[1]] = row[0]
file_one.close()
Use this to search in second file and build a output map:
output = defaultdict(list)
with open(file_2) as file_two:
csv_reader = csv.reader(file_two, delimiter='\t')
for row in csv_reader:
key = row[1][:2]
output[barcodes[key]].append(row[0])
output[barcodes[key]].append(row[1][2:])
file_two.close()
The output dictionary would then be:
{
'BC01':['INV01', 'axxxx', 'INV02', 'bxxxx']
'BC02':['INV03', 'cxxxx']
'BC03':['INV04', 'dxxxx']
}
Now iterate through this dictionary, create files with the names of the keys and write out the contents of the file with the corresponding values.

How to read a text file into a list or an array with Python

I am trying to read the lines of a text file into a list or array in python. I just need to be able to individually access any item in the list or array after it is created.
The text file is formatted as follows:
0,0,200,0,53,1,0,255,...,0.
Where the ... is above, there actual text file has hundreds or thousands more items.
I'm using the following code to try to read the file into a list:
text_file = open("filename.dat", "r")
lines = text_file.readlines()
print lines
print len(lines)
text_file.close()
The output I get is:
['0,0,200,0,53,1,0,255,...,0.']
1
Apparently it is reading the entire file into a list of just one item, rather than a list of individual items. What am I doing wrong?
You will have to split your string into a list of values using split()
So,
lines = text_file.read().split(',')
EDIT:
I didn't realise there would be so much traction to this. Here's a more idiomatic approach.
import csv
with open('filename.csv', 'r') as fd:
reader = csv.reader(fd)
for row in reader:
# do something
You can also use numpy loadtxt like
from numpy import loadtxt
lines = loadtxt("filename.dat", comments="#", delimiter=",", unpack=False)
So you want to create a list of lists... We need to start with an empty list
list_of_lists = []
next, we read the file content, line by line
with open('data') as f:
for line in f:
inner_list = [elt.strip() for elt in line.split(',')]
# in alternative, if you need to use the file content as numbers
# inner_list = [int(elt.strip()) for elt in line.split(',')]
list_of_lists.append(inner_list)
A common use case is that of columnar data, but our units of storage are the
rows of the file, that we have read one by one, so you may want to transpose
your list of lists. This can be done with the following idiom
by_cols = zip(*list_of_lists)
Another common use is to give a name to each column
col_names = ('apples sold', 'pears sold', 'apples revenue', 'pears revenue')
by_names = {}
for i, col_name in enumerate(col_names):
by_names[col_name] = by_cols[i]
so that you can operate on homogeneous data items
mean_apple_prices = [money/fruits for money, fruits in
zip(by_names['apples revenue'], by_names['apples_sold'])]
Most of what I've written can be speeded up using the csv module, from the standard library. Another third party module is pandas, that lets you automate most aspects of a typical data analysis (but has a number of dependencies).
Update While in Python 2 zip(*list_of_lists) returns a different (transposed) list of lists, in Python 3 the situation has changed and zip(*list_of_lists) returns a zip object that is not subscriptable.
If you need indexed access you can use
by_cols = list(zip(*list_of_lists))
that gives you a list of lists in both versions of Python.
On the other hand, if you don't need indexed access and what you want is just to build a dictionary indexed by column names, a zip object is just fine...
file = open('some_data.csv')
names = get_names(next(file))
columns = zip(*((x.strip() for x in line.split(',')) for line in file)))
d = {}
for name, column in zip(names, columns): d[name] = column
This question is asking how to read the comma-separated value contents from a file into an iterable list:
0,0,200,0,53,1,0,255,...,0.
The easiest way to do this is with the csv module as follows:
import csv
with open('filename.dat', newline='') as csvfile:
spamreader = csv.reader(csvfile, delimiter=',')
Now, you can easily iterate over spamreader like this:
for row in spamreader:
print(', '.join(row))
See documentation for more examples.
Im a bit late but you can also read the text file into a dataframe and then convert corresponding column to a list.
lista=pd.read_csv('path_to_textfile.txt', sep=",", header=None)[0].tolist()
example.
lista=pd.read_csv('data/holdout.txt',sep=',',header=None)[0].tolist()
Note: the column name of the corresponding dataframe will be in the form of integers and i choose 0 because i was extracting only the first column
Better this way,
def txt_to_lst(file_path):
try:
stopword=open(file_path,"r")
lines = stopword.read().split('\n')
print(lines)
except Exception as e:
print(e)

Categories