Dynamically create lists of lists from CSV file data - python

I have a CSV file that I want am reading as a configuration file to create a list of lists to store data in.
The format of my CSV file is:
list_name, search_criteria
channel1, c1
channel2, c2
channel3, c3
I want to read in the CSV file and dynamically create the list of lists from the list_name data as it could grow and shrink over time and I always want whatever is defined in the CSV file.
The list_name in the CSV file is a "prefix" to the list name i want to create dynamically. For example, read in "channel1", "channel2", "channel3" from the csv file and create a lists of lists where "mainList[]" is the core list and contains 3 lists within it named "channel1_channel_list", "channel2_channel_list", "channel3_cannel_list".
I realize my naming conventions could be simplified so please disregard. I'll remain once i have a working solution. I will be using the search criteria to populate the lists within mainLists[].
Here is my incomplete code:
mainList = []
with open('list_config.csv') as input_file:
dictReader = csv.DictReader(input_file)
for row in dictReader:
listName = row['list_name'] + '_channel_list'

Here's how to read your data and create a dict from it.
As well as reading data from files, the csv readers can read their data from a list of strings, which is handy for example code like this. With your data you need to specify skipinitialspace=True to skip over the spaces after the commas.
import csv
data = '''\
list_name, search_criteria
channel1, c1
channel2, c2
channel3, c3
'''.splitlines()
dictReader = csv.DictReader(data, skipinitialspace=True)
main_dict = {}
for row in dictReader:
name = row['list_name'] + '_channel_list'
criteria = row['search_criteria']
main_dict[name] = criteria
# Print data sorted by list_name
keys = sorted(main_dict)
for k in keys:
print(k, main_dict[k])
output
channel1_channel_list c1
channel2_channel_list c2
channel3_channel_list c3
This code is a little more complicated than Joe Iddon's version, but it's a little easier to adapt if you have more than two entries per row. OTOH, if you do only have two entries per row then you should probably use Joe's simpler approach.

Load in the file in a with statement, then use csv.reader which returns a reader object. This can then be converted to a dictionary by passing it into dict():
with open('list_config.csv') as input_file:
dictionary = dict(csv.reader(input_file))
Now, the contents of dictionary is:
{'channel3': ' c3', 'channel1': ' c1', 'channel2': ' c2'}

Related

Converting a dictionary with nested lists into a CSV file

Been having a hard time trying to convert this dictionary with nested lists into a CSV file. I have a CSV file I am filtering - https://easyupload.io/8zobej. I turned it into a dictionary then cleaned it up. I am now stuck on trying to output it to a CSV and I don't know what to do. I've tried many different combinations of DictWriter and writerows but I keep coming up short. I am now trying to come up with a for loop that would go through the dictionary and output the value it finds to the CSV.
Here is my code - please excuse the comments - I was trying many things.
def dataSorter(filename:str):
"""
The defined function scans the inputted CSV file with 2 columns (Category, value) and sorts the values into categories.
Giving us lists of values for each category
Done by
"""
#Open the input csv file and parse them by comma delimited
with open(filename) as inputcsv:
readcsv = csv.reader(inputcsv, delimiter = ',')
sortedData = {}
#skips first row
next(readcsv)
#loops through file and assigns values to the key in dictionary "sortedData"
for i in readcsv:
category = i[0]
if category not in sortedData:
sortedData[category] = [i[1]]
else:
if i[1] not in sortedData[category]:
sortedData[category].append(i[1])
sortedData[category].sort()
for category in sortedData.keys():
sortedData[category].sort()

output file to CSV

I'm trying to parse a data from json file and create csv file from that output. I've written the python script to create output as per my needs. I need to sort the below csv file in time and date.
current output
My code:
## Shift Start | End time. | Primary | Secondary
def write_CSV () :
# field names
fields = ['ShiftStart', 'EndTime', 'Primary', 'Secondary']
# name of csv file
filename = "CallingLog.csv"
# writing to csv file
with open(filename, 'w') as csvfile:
# creating a csv dict writer object
writer = csv.DictWriter(csvfile, delimiter=',', lineterminator='\n', fieldnames = fields)
# writing headers (field names)
writer.writeheader()
# writing data rows
writer.writerows(totalData)
I want my csv file to be sorted out with date and time like below. atleast date would be fine.
ShiftStart
2020-11-30T17:00:00-08:00
2020-12-01T01:00:00-08:00
2020-12-02T05:00:00-08:00
2020-12-03T05:00:00-08:00
2020-12-04T09:00:00-08:00
2020-12-05T13:00:00-08:00
2020-12-06T13:00:00-08:00
2020-12-07T09:00:00-08:00
2020-12-08T17:00:00-08:00
2020-12-09T09:00:00-08:00
2020-12-10T09:00:00-08:00
2020-12-11T17:00:00-08:00
YourDataframe.sort_values(['Col1','Col2']).to_csv('Path')
Try this, this not only sort and copy to csv but also retain original dataframe without sorting in program for further operations if needed..!
You can adapt this example to your data (that I have not in my possession -:)
from csv import DictReader, DictWriter
from sys import stdout
# simple, self-contained data
data = '''\
a,b,c
3,2,1
2,2,3
1,3,2
'''.splitlines()
# read the data
dr = DictReader(data)
rows = [row for row in dr]
# print the data
print('# unsorted')
dw = DictWriter(stdout, dr.fieldnames)
dw.writeheader()
dw.writerows(rows)
print('# sorted')
dw = DictWriter(stdout, dr.fieldnames)
dw.writeheader()
dw.writerows(sorted(rows, key=lambda d:d['a']))
# unsorted
a,b,c
3,2,1
2,2,3
1,3,2
# sorted
a,b,c
1,3,2
2,2,3
3,2,1
In [40]:
When you read the data using a DictReader, each element of the list rows is a dictionary, keyed on the field names of the first line of the CSV data file.
When you want to sort this list according to the values corresponding to a key, you have to provide sorted with a key argument, that is a function that returns the value on which you want to sort.
This function is called with the whole element to be sorted, in your case a dictionary, and we want to sort on the first field of the CSV, the one indexed by 'a', so that our function, using the lambda syntx to inline the definition in the function call, is just lambda d: d['a'] that returns the value on which we want to sort.
NOTE the sort in this case is alphabetically sorted, and works because I'm dealing with single digits, in general you possibly need to convert the value (by default a string) to something else that makes sense in your context, e.g., lambda d: int(d['a']).

Python code to process CSV file

I am getting the CSV file updated on daily basis. Need to process and create new file based on the criteria - If New data then should be tagged as New against the row and if its an update to the existing data then should be tagged as Update. How to write a Python code to process and output in CSV file as follows based on the date.
Day1 input data
empid,enmname,sal,datekey
1,cholan,100,8/14/2018
2,ram,200,8/14/2018
Day2 input Data
empid,enmname,sal,datekey
1,cholan,100,8/14/2018
2,ram,200,8/14/2018
3,sundar,300,8/15/2018
2,raman,200,8/15/2018
Output Data
status,empid,enmname,sal,datekey
new,3,sundar,300,8/15/2018
update,2,raman,200,8/15/2018
I'm feeling nice, so I'll give you some code. Try to learn from it.
To work with CSV files, we'll need the csv module:
import csv
First off, let's teach the computer how to open and parse a CSV file:
def parse(path):
with open(path) as f:
return list(csv.DictReader(f))
csv.DictReader reads the first line of the csv file and uses it as the "names" of the columns. It then creates a dictionary for each subsequent row, where the keys are the column names.
That's all well and good, but we just want the last version with each key:
def parse(path):
data = {}
with open(path) as f:
for row in csv.DictReader(f):
data[row["empid"]] = row
return data
Instead of just creating a list containing everything, this creates a dictionary where the keys are the row's id. This way, rows found later in the file will overwrite rows found earlier in the file.
Now that we've taught the computer how to extract the data from the files, let's get it:
old_data = parse("file1.csv")
new_data = parse("file2.csv")
Iterating through a dictionary gives you its keys, which are the ids defined in the data set. For consistency, key in dictionary says whether key is one of the keys in the dictionary. So we can do this:
new = {
id_: row
for id_, row in new_data.items()
if id_ not in old_data
}
updated = {
id_: row
for id_, row in new_data.items()
if id_ in old_data and old_data[id_] != row
}
I'll put csv.DictWriter here and let you sort out the rest on your own.

How to read a text file into a list or an array with Python

I am trying to read the lines of a text file into a list or array in python. I just need to be able to individually access any item in the list or array after it is created.
The text file is formatted as follows:
0,0,200,0,53,1,0,255,...,0.
Where the ... is above, there actual text file has hundreds or thousands more items.
I'm using the following code to try to read the file into a list:
text_file = open("filename.dat", "r")
lines = text_file.readlines()
print lines
print len(lines)
text_file.close()
The output I get is:
['0,0,200,0,53,1,0,255,...,0.']
1
Apparently it is reading the entire file into a list of just one item, rather than a list of individual items. What am I doing wrong?
You will have to split your string into a list of values using split()
So,
lines = text_file.read().split(',')
EDIT:
I didn't realise there would be so much traction to this. Here's a more idiomatic approach.
import csv
with open('filename.csv', 'r') as fd:
reader = csv.reader(fd)
for row in reader:
# do something
You can also use numpy loadtxt like
from numpy import loadtxt
lines = loadtxt("filename.dat", comments="#", delimiter=",", unpack=False)
So you want to create a list of lists... We need to start with an empty list
list_of_lists = []
next, we read the file content, line by line
with open('data') as f:
for line in f:
inner_list = [elt.strip() for elt in line.split(',')]
# in alternative, if you need to use the file content as numbers
# inner_list = [int(elt.strip()) for elt in line.split(',')]
list_of_lists.append(inner_list)
A common use case is that of columnar data, but our units of storage are the
rows of the file, that we have read one by one, so you may want to transpose
your list of lists. This can be done with the following idiom
by_cols = zip(*list_of_lists)
Another common use is to give a name to each column
col_names = ('apples sold', 'pears sold', 'apples revenue', 'pears revenue')
by_names = {}
for i, col_name in enumerate(col_names):
by_names[col_name] = by_cols[i]
so that you can operate on homogeneous data items
mean_apple_prices = [money/fruits for money, fruits in
zip(by_names['apples revenue'], by_names['apples_sold'])]
Most of what I've written can be speeded up using the csv module, from the standard library. Another third party module is pandas, that lets you automate most aspects of a typical data analysis (but has a number of dependencies).
Update While in Python 2 zip(*list_of_lists) returns a different (transposed) list of lists, in Python 3 the situation has changed and zip(*list_of_lists) returns a zip object that is not subscriptable.
If you need indexed access you can use
by_cols = list(zip(*list_of_lists))
that gives you a list of lists in both versions of Python.
On the other hand, if you don't need indexed access and what you want is just to build a dictionary indexed by column names, a zip object is just fine...
file = open('some_data.csv')
names = get_names(next(file))
columns = zip(*((x.strip() for x in line.split(',')) for line in file)))
d = {}
for name, column in zip(names, columns): d[name] = column
This question is asking how to read the comma-separated value contents from a file into an iterable list:
0,0,200,0,53,1,0,255,...,0.
The easiest way to do this is with the csv module as follows:
import csv
with open('filename.dat', newline='') as csvfile:
spamreader = csv.reader(csvfile, delimiter=',')
Now, you can easily iterate over spamreader like this:
for row in spamreader:
print(', '.join(row))
See documentation for more examples.
Im a bit late but you can also read the text file into a dataframe and then convert corresponding column to a list.
lista=pd.read_csv('path_to_textfile.txt', sep=",", header=None)[0].tolist()
example.
lista=pd.read_csv('data/holdout.txt',sep=',',header=None)[0].tolist()
Note: the column name of the corresponding dataframe will be in the form of integers and i choose 0 because i was extracting only the first column
Better this way,
def txt_to_lst(file_path):
try:
stopword=open(file_path,"r")
lines = stopword.read().split('\n')
print(lines)
except Exception as e:
print(e)

Parsing CSV / tab-delimited txt file with Python

I currently have a CSV file which, when opened in Excel, has a total of 5 columns. Only columns A and C are of any significance to me and the data in the remaining columns is irrelevant.
Starting on line 8 and then working in multiples of 7 (ie. lines 8, 15, 22, 29, 36 etc...), I am looking to create a dictionary with Python 2.7 with the information from these fields. The data in column A will be the key (a 6-digit integer) and the data in column C being the respective value for the key. I've tried to highlight this below but the formatting isn't the best:-
A B C D
1 CDCDCDCD
2 VDDBDDB
3
4
5
6
7 DDEFEEF FEFEFEFE
8 123456 JONES
9
10
11
12
13
14
15 293849 SMITH
As per the above, I am looking to extract the value from A7 (DDEFEEF) as a key in my dictionary and "FEFEFEFE" being the respective data and then add another entry to my dictionary, jumping to line 15 with "2938495" being my key and "Smith" being the respective value.
Any suggestions? The source file is a .txt file with entries being tab-delimited.
Thanks
Clarification:
Just to clarify, so far, I have tried the below:-
import csv
mydict = {:}
f = open("myfile", 'rt')
reader = csv.reader(f)
for row in reader:
print row
The above simply prints out all content though a row at a time. I did try "for row(7) in reader" but this returned an error. I then researched it and had a go at the below but it didn't work neither:
import csv
from itertools import islice
entries = csv.reader(open("myfile", 'rb'))
mydict = {'key' : 'value'}
for i in xrange(6):
mydict['i(0)] = 'I(2) # integers representing columns
range = islice(entries,6)
for entry in range:
mydict[entries(0) = entries(2)] # integers representing columns
Start by turning the text into a list of lists. That will take care of the parsing part:
lol = list(csv.reader(open('text.txt', 'rb'), delimiter='\t'))
The rest can be done with indexed lookups:
d = dict()
key = lol[6][0] # cell A7
value = lol[6][3] # cell D7
d[key] = value # add the entry to the dictionary
...
Although there is nothing wrong with the other solutions presented, you could simplify and greatly escalate your solutions by using python's excellent library pandas.
Pandas is a library for handling data in Python, preferred by many Data Scientists.
Pandas has a simplified CSV interface to read and parse files, that can be used to return a list of dictionaries, each containing a single line of the file. The keys will be the column names, and the values will be the ones in each cell.
In your case:
import pandas
def create_dictionary(filename):
my_data = pandas.DataFrame.from_csv(filename, sep='\t', index_col=False)
# Here you can delete the dataframe columns you don't want!
del my_data['B']
del my_data['D']
# ...
# Now you transform the DataFrame to a list of dictionaries
list_of_dicts = [item for item in my_data.T.to_dict().values()]
return list_of_dicts
# Usage:
x = create_dictionary("myfile.csv")
If the file is large, you may not want to load it entirely into memory at once. This approach avoids that. (Of course, making a dict out of it could still take up some RAM, but it's guaranteed to be smaller than the original file.)
my_dict = {}
for i, line in enumerate(file):
if (i - 8) % 7:
continue
k, v = line.split("\t")[:3:2]
my_dict[k] = v
Edit: Not sure where I got extend from before. I meant update

Categories