Adding items to list of lists - Python - python

I am looking to use the code I have here to match domains to their DNS resolver name
Current CSV output
domain1 dns1 dns2 dns3 \n domain2 dns1 dns2 dns3 \n etc
This is the incorrect format, because it is adding all domains and dns resolvers to the same row, instead of a new row based on the new domain. They are only separated by a blank cell because of the newline character. I instead want it to be written as below, where each domain & its dns resolvers are written to their own individual row.
Expected CSV output:
domain1 dns1 dns2 dns3
domain2 dns1 dns2 dns3
domain3 dns1 dns2 dns3
I want the CSV file to be written out in the correct format, and with the code that I have, every time a domain is passed to def dns_resolver, it should iterate to a new list index. that way, each domain, and it's dns resolvers have their own list, so when writing out to a new CSV file, each domain will be printed in it's own row in the CSV file.
The code is not iterating through the list index correctly, and does not add the domain & its dns names to any list because of this. When they are written all into the same list, it works fine, but they are written out all into the same row, which is incorrect. So instead of using 1 list, I am going to use a list of lists, and write each to its own list, and then write each list to the csv file, so that they are in their own rows. Normally the domains will be read into a list from a csv file, but for the sake of this, I entered 3 values.
import dns.resolver
import csv
import os
from os.path import dirname, abspath
r = 0
def dns_resolver(domain):
server = []
resolvers = []
resolvers = dns.resolver.resolve(domain, 'NS')
for x in resolvers:
server.append('did not resolve')
return (domain, *server)
# Read in all domains from csv file domains.csv & count how many domains there are listed
domain_list = ['', '', '']
domain_amount = 0
with open(domainFName, 'r') as file:
for line in csv.reader(file):
name = (line)
domain_amount += 1
for first_domain in domain_list:
for x in first_domain:

You can simply make your dns_resolver function return a list for given domain.
The *server is a shorthand to append each item into a list.
Using list comprehension, collect all lists into a list of lists to write to CSV.
def dns_resolver(domain):
# do your dns resolution
# server = dns.resolver.resolve(domain, 'NS')
server = ["dns1", "dns2", "dns3", "dns4"]
return [domain, *server]
# Read in all domains
domain_list = ['', '', '']
print([dns_resolver(d) for d in domain_list])
['', 'dns1', 'dns2', 'dns3', 'dns4'],
['', 'dns1', 'dns2', 'dns3', 'dns4'],
['', 'dns1', 'dns2', 'dns3', 'dns4']


Extract time values from a list and add to a new list or array

I have a script that reads through a log file that contains hundreds of these logs, and looks for the ones that have a "On, Off, or Switch" type. Then I output each log into its own list. I'm trying to find a way to extract the Out and In times into a separate list/array and then subtract the two times to find the duration of each separate log. This is what the outputted logs look like:
['2020-01-31T12:04:57.976Z 1234 Out: [2020-01-31T00:30:20.150Z] Id: {"Id":"4-f-4-9-6a"', '"Type":"Switch"', '"In":"2020-01-31T00:30:20.140Z"']
This is my current code:
logfile = '/path/to/my/logfile'
with open(logfile, 'r') as f:
text =
words = ["On", "Off", "Switch"]
text2 = text.split('\n')
for l in text.split('\n'):
if (words[0] in l or words[1] in l or words[2] in l):
log = l.split(',')[0:3]
I'm stuck on how to target only the Out and In time values from the logs and put them in an array and convert to a time value to find duration.
Initial log before script: everything after the "In" time is useless for what I'm looking for so I only have the first three indices outputted
2020-01-31T12:04:57.976Z 1234 Out: [2020-01-31T00:30:20.150Z] Id: {"Id":"4-f-4-9-6a","Type":"Switch,"In":"2020-01-31T00:30:20.140Z","Path":"interface","message":"interface changed status from unknown to normal","severity":"INFORMATIONAL","display":true,"json_map":"{\"severity\":null,\"eventId\":\"65e-64d9-45-ab62-8ef98ac5e60d\",\"componentPath\":\"interface_css\",\"displayToGui\":false,\"originalState\":\"unknown\",\"closed\":false,\"eventType\":\"InterfaceStateChange\",\"time\":\"2019-04-18T07:04:32.747Z\",\"json_map\":null,\"message\":\"interface_css changed status from unknown to normal\",\"newState\":\"normal\",\"info\":\"Event created with current status\"}","closed":false,"info":"Event created with current status","originalState":"unknown","newState":"normal"}
Below is a possible solution. The wordmatch line is a bit of a hack, until I find something clearer: it's just a one-liner that create an empty or 1-element set of True if one of the words matches.
import re
logfile = '/path/to/my/logfile'
words = ["On", "Off", "Switch"]
dateformat = r'\d{4}\-\d{2}\-\d{2}T\d{2}:\d{2}:\d{2}\.\d+[Zz]?'
pattern = fr'Out:\s*\[(?P<out>{dateformat})\].*In":\s*\"(?P<in>{dateformat})\"'
regex = re.compile(pattern)
with open(logfile, 'r') as f:
for line in f:
wordmatch = set(filter(None, (word in s for word in words)))
if wordmatch:
match =
if match:
intime ='in')
outtime ='out')
# whatever to store these strings, e.g., append to list or insert in a dict.
As noted, your log example is very awkward, so this works for the example line, but may not work for every line. Adjust as necessary.
I have also not included (if so wanted), a conversion to a datetime.datetime object. For that, read through the datetime module documentation, in particular datetime.strptime. (Alternatively, you may want to store your results in a Pandas table. In that case, read through the Pandas documentation on how to convert strings to actual datetime objects.)
You also don't need to read nad split on newlines yourself: for line in f will do that for you (provided f is indeed a filehandle).
Regex is probably the way to go (fastness, efficiency etc.) ... but ...
You could take a very simplistic (if very inefficient) approach of cleaning your data:
join all of it into a string
replace things that hinder easy parsing
split wisely and filter the split
like so:
data = ['2020-01-31T12:04:57.976Z 1234 Out: [2020-01-31T00:30:20.150Z] Id: {"Id":"4-f-4-9-6a"', '"Type":"Switch"', '"In":"2020-01-31T00:30:20.140Z"']
all_text = " ".join(data)
# this is inefficient and will create throwaway intermediate strings - if you are
# in a hurry or operate on 100s of MB of data, this is NOT the way to go, unless
# you have time
# iterate pairs of ("bad thing", "what to replace it with") (or list of bad things)
for thing in [ (": ",":"), (list('[]{}"'),"") ]:
whatt = thing[0]
withh = thing[1]
# if list, do so for each bad thing
if isinstance(whatt, list):
for p in whatt:
# replace it
all_text = all_text.replace(p,withh)
all_text = all_text.replace(whatt,withh)
# format is now far better suited to splitting/filtering
cleaned = [a for a in all_text.split(" ")
if any(a.startswith(prefix) or "Switch" in a
for prefix in {"In:","Switch:","Out:"})]
['Out:2020-01-31T00:30:20.150Z', 'Type:Switch', 'In:2020-01-31T00:30:20.140Z']
After cleaning your data would look like:
2020-01-31T12:04:57.976Z 1234 Out:2020-01-31T00:30:20.150Z Id:Id:4-f-4-9-6a Type:Switch In:2020-01-31T00:30:20.140Z
You can transform the clean list into a dictionary for ease of lookup:
d = dict( part.split(":",1) for part in cleaned)
will produce:
{'In': '2020-01-31T00:30:20.140Z',
'Type': 'Switch',
'Out': '2020-01-31T00:30:20.150Z'}
You can use datetime module to parse the times from your values as shown in 0 0 post.

How to get the same name with multiple value get unique results in Python

I have a large csv file that compares the URLs of my txt files
How to get the same name with multiple value get unique results in Python and Is there a way to better compare the speed of two files? because it has a minimum large csv file of 1 gb
[01/Nov/2019:09:54:26 +0900] ","","","","200","CONNECT","","555976","1508"
[01/Nov/2019:09:54:26 +0900] ","","","","200","CONNECT","","555976","1508"
[01/Nov/2019:09:54:26 +0900] ","","","","200","CONNECT","","555976","1508"
[01/Nov/2019:09:54:26 +0900] ","","","21.323.12.96","200","CONNECT","","555976","1508"
[01/Nov/2019:09:54:26 +0900] ","","","","200","CONNECT","","555976","1508"
[01/Nov/2019:09:54:26 +0900] ","","","","200","CONNECT","","555976","1508"
[01/Nov/2019:09:54:26 +0900] ","","","","200","CONNECT","","555976","1508"
1 shop
1 shop
import csv
with open("file1.csv", 'r') as f:
reader = csv.reader(f)
for k in reader:
ko = set()
srcip = k[2]
url = k[6]
lines = url.replace(":443", "").replace(":8080", "")
war = lines.split("//")[-1].split("/")[0].split('?')[0]
for to in ko:
with open("file2.txt", "r") as f:
all_val = set()
for i in f:
val = i.strip().split(" ")[1]
if val in to[0]:
for ki in all_val:
my output:
('', '')
('', '')
('', '')
('', '')
('', '')
('', '')
how to get if the url is the same, get the total value with a unique value
how to get results like this?
Short answer: you can't directly do so. Well you can but with low performances.
CSV is a good storing format but if you want to do something like that you might want to store everything in another custom data file. you could first parse your file to have only Unique IDs instead of long strings (like amazon = 0, wakers = 1 and so on) to perform better and reduce compare cost.
The thing is, those thing are pretty bad for variable csv, memory mapping or building a database from your csv might also be great though (and making the changes on the database, only dumping the csv when you need to)
look at: How do quickly search through a .csv file in Python for a more complete answer.
Problem solution
import csv
import re
def possible_urls(filename, category, category_position, url_position):
# Here we will read a txt file to create a list of domains, that could correspond to shops
domains = []
with open(filename, "r") as file:
file_content =
for line in file_content:
info_in_line = line.split(" ")
# Here i use a regular expression, to prase domain from url.
domain = re.sub('www.', '', info_in_line[url_position])
if info_in_line[category_position] == category:
return domains
def read_from_csv(filename, ip_position, url_position, possible_domains):
# Here we will create a dictionary, where will
# all ips that this domain can have.
# Dictionary will look like this:
# {domain_name: [list of possible ips]}
domain_ip = {domain: [] for domain in possible_domains}
with open(filename, 'r') as f:
reader = csv.reader(f)
for line in reader:
if len(line) < max(ip_position, url_position):
print(f'Not enough items in line {line}, to obtain url or ip')
ip = line[ip_position]
url = line[url_position]
# Using python regular expression to get a domain name
# from url.
domain ='//[w]?[w]?[w]?\.?(.[^/]*)[:|/]', url).group(1)
if domain in domain_ip.keys():
return domain_ip
def print_fomatted_result(result):
# Prints formatted result
for shop_domain in result.keys():
print(f'{shop_domain}: ')
for shop_ip in result[shop_domain]:
print(f' {shop_ip}')
def create_list_of_shops():
# Function that first creates a list of possible domains, and
# then read ip for that domains from csv
possible_domains = possible_urls('file2.txt', 'shop', 2, 1)
shop_domains_with_ip = read_from_csv('file1.csv', 2, 6, possible_domains)
# Display result, we get in previous operations
Dictionary of ip's where domains are keys, so you can get all possible ip's for domain by giving a name of that domain:
{'': ['', '', '', '', ''], '': ['']}
Regular expressions
A very useful thing you can learn from the solution is regular expressions. Regular expressions are tools that allow you to filter or retrieve information from lines in a very convenient way. It also greatly reduces the amount of code, which makes the code more readable and safe.
Let's consider your code of removing ports from strings and think how we can replace it with regex.
lines = url.replace(":443", "").replace(":8080", "")
Replacing of ports in such way is vulnerable, because you never can be sure, what port numbers can actually be in url. What if there will appear port number 5460, or port number 1022, etc. For each of such ports you will add new replaces and soon your code will look something like this
lines = url.replace(":443", "").replace(":8080", "").replace(":5460","").replace(":1022","")...
Not very readable. But with regular experssion you can describe a pattern. And the great news is that we actually know pattern for url with port numbers. They all looking like this:
:some_digits. So if we know pattern we can describe it with regular expression, and tell python to find everything, that match it and replace with empty string '':
re.sub(':\d+', '', url)
It tells to python regular expression engine:
Look for all digits in string url, that goes after : and replace them with empty string. This solution is shorter, safer and a way more readable then solution with replace chain, so I suggest you to read about them a little. Great resource to learn about regular expressions is
this site. Here you can test your regex.
Explanation of Regular expressions in code
re.sub('www.', '', info_in_line[url_position])
Look for all www. in string info_in_line[url_position] and replace it with empty string.'www.(.[^/]*)[:|/]', url).group(1)
Let's split it on parts:
[^/] - here could be everything except /
(.[^/]*) - Here i used match group. It tells to engine where solution we intersted in will be.
[:|/] - it means characters that could stay on that place. Long story short: after capturing group could be : or(|) /.
So summarizing. Regex can be expressed in words as follows:
Find all substrings, that starts with www., and ends with : or \ and return me everything that stadns between them.
group(1) - means get the first match.
Hope answer will be helpful!
If you used the URL as the key in a dictionary, and had your IP address sets as the elements of the dictionary, would that achieve what you intended?
my_dict = {
'' = {
'' = {''},
## I have used your code & Pandas to get your desired output
## Copy paste the code & execute to get the result
import csv
url_dict = {}
## STEP 1: Open file2.txt to get url names
with open("file2.txt", "r") as f:
for i in f:
val = i.strip().split(" ")[1]
url_dict[val] = []
## STEP 2: 2.1 Open csv file 'file1.csv' to extract url name & ip address
## 2.2 Check if url from file2.txt is available from the extracted url from 'file1.csv'
## 2.3 Create a dictionary with the matched url & its ip address
## 2.4 Remove duplicates in ip addresses from same url
with open("file1.csv", 'r') as f: ## 2.1
reader = csv.reader(f)
for k in reader:
#ko = set()
srcip = k[2]
url = k[6]
lines = url.replace(":443", "").replace(":8080", "")
war = lines.split("//")[-1].split("/")[0].split('?')[0]
for key, value in url_dict.items():
if key in war: ## 2.2
url_dict[key].append(srcip) ## 2.3
## 2.4
for key, value in url_dict.items():
url_dict[key] = list(set(value))
## STEP 3: Print dictionary output to .TXT file
file3 = open('output_text.txt', 'w')
for key, value in url_dict.items():
file3.write('\n' + key + '\n')
for item in value:
file3.write(' '*15 + item + '\n')

Matching Outlook Calendar Dates to CSV Dates and Appending Match to List

Disclaimer: I'm relatively new to Python. I am attempting to write a script that will go through a CSV, check if certain columns match Outlook calendar items (by subject, organizer, and date match), and then have the script note that there was a successful match in a new column (I stole heavily from this question). Below is my whole script.
import win32com.client, datetime, re, os, csv, shutil
# set cwd, create copy of original CSV, and access outlook API
shutil.copy('masterCheck.csv', 'inspectCheck.csv')
outlook = win32com.client.Dispatch("Outlook.Application").GetNamespace("MAPI")
inspectors = {'ASMITH': 'Smith, Aaron', 'JDOE': 'Doe, Jane', 'KZEEBOP': 'Zeebop, Kim'}
#access csv and put it into a list
with open('inspectCheck.csv', 'r', newline = '', encoding='utf-8') as csvAppointments:
reader = csv.reader(csvAppointments)
masterList = list(reader)
del masterList[-1] # delete blank nested list
del masterList[1] #delete header
for i in masterList: # switch out names so they can match Outlook descriptors later
for key, value in inspectors.items():
if i[3] in key:
i[3] = value
# create another list for appending later in the script
finalList = []
finalList += masterList
# access the inspectors' calendars
x = 0
for inspector in inspectors.values():
recipient = outlook.createRecipient(inspector)
resolved = recipient.Resolve()
sharedCalendar = outlook.GetSharedDefaultFolder(recipient, 9)
codeAppointments = sharedCalendar.Items
#restrict to items in the next year
begin =
end = begin + datetime.timedelta(days = 365);
restriction = "[Start] >= '" + begin.strftime("%m/%d/%Y") + "' AND [End] <= '" +end.strftime("%m/%d/%Y") + "'"
restrictedItems = codeAppointments.Restrict(restriction)
# loop through inspectors' appointments and match
for appointmentItem in restrictedItems:
for i in masterList:
addressSearch = i[1]
if'%s' % addressSearch, appointmentItem.Subject, re.IGNORECASE)\
and i[3] in appointmentItem.Organizer\
and i[4] in appointmentItem.Start:
x += 1
except IndexError:
# update spreadsheet
with open('inspectCheckFinal.csv', 'w', newline = '') as csvAppointments:
appointmentsWriter = csv.writer(csvAppointments)
I have had success matching columns from my CSV to Outlook items. For example, I can get this to match.
if'%s' % addressSearch, appointmentItem.Subject, re.IGNORECASE)
However, as soon as I try to match it to my date column (i[4]), it throws the error: TypeError: argument of type 'pywintypes.datetime' is not iterable. The dates in my CSV look like 2/11/2018 but the dates (when printed) in Outlook look like 2017-11-16 11:00:00+00:00. I'm at a loss on how to match these two.
Moreover, I am having trouble marking the CSV with successful matches. While the script will append a value to the end of each nested list (and then write it to a CSV), it will not append the value to the matched row of the CSV. For example, my output looks like:
Inspector |Address |Date |Success?(print Address)
ASMITH |212 Clark St|11/21/18 |Yes. 33 Blart Ave
ASMITH |33 Blart Ave|11/20/18 |Yes. 212 Clark St
My guess is that my script finds a match in Outlook, and then appends that value to the end of a nested list. What I would like it to do is to match it in the row/nested list where it was actually matched. Apologies for the long post and thank you to those who read it.
Nevermind, I solved the TypeError by converting
and i[4] in appointmentItem.Start:
# to
and i[4] in str(appointmentItem.Start):
Moreover, I reformatted my CSV beforehand so it will now match Outlook's format. As for my matches being appended in the wrong rows, I think I will solve that by appending matches to a separate CSV/dataframe, and then joining that dataframe to the original CSV/dataframe.

Can't get unique word/phrase counter to work - Python

I'm having trouble getting anything to write in my outut file (word_count.txt).
I expect the script to review all 500 phrases in my phrases.txt document, and output a list of all the words and how many times they appear.
from re import findall,sub
from os import listdir
from collections import Counter
# path to folder containg all the files
str_dir_folder = '../data'
# name and location of output file
str_output_file = '../data/word_count.txt'
# the list where all the words will be placed
list_file_data = '../data/phrases.txt'
# loop through all the files in the directory
for str_each_file in listdir(str_dir_folder):
if str_each_file.endswith('data'):
# open file and read
with open(str_dir_folder+str_each_file,'r') as file_r_data:
str_file_data =
# add data to list
# clean all the data so that we don't have all the nasty bits in it
str_full_data = ' '.join(list_file_data)
str_clean1 = sub('t','',str_full_data)
str_clean_data = sub('n',' ',str_clean1)
# find all the words and put them into a list
list_all_words = findall('w+',str_clean_data)
# dictionary with all the times a word has been used
dict_word_count = Counter(list_all_words)
# put data in a list, ready for output file
list_output_data = []
for str_each_item in dict_word_count:
str_word = str_each_item
int_freq = dict_word_count[str_each_item]
str_out_line = '"%s",%d' % (str_word,int_freq)
# populates output list
# create output file, write data, close it
file_w_output = open(str_output_file,'w')
Any help would be great (especially if I'm able to actually output 'single' words within the output list.
thanks very much.
Would be helpful if we got more information such as what you've tried and what sorts of error messages you received. As kaveh commented above, this code has some major indentation issues. Once I got around those, there were a number of other logic errors to work through. I've made some assumptions:
list_file_data is assigned to '../data/phrases.txt' but there is then a
loop through all file in a directory. Since you don't have any handling for
multiple files elsewhere, I've removed that logic and referenced the
file listed in list_file_data (and added a small bit of error
handling). If you do want to walk through a directory, I'd suggest
using os.walk() (
You named your file 'pharses.txt' but then check for if the files
that endswith 'data'. I've removed this logic.
You've placed the data set into a list when findall works just fine with strings and ignores special characters that you've manually removed. Test here: to make sure.
Changed 'w+' to '\w+' - check out the above link
Converting to a list outside of the output loop isn't necessary - your dict_word_count is a Counter object which has an 'iteritems' method to roll through each key and value. Also changed the variable name to 'counter_word_count' to be slightly more accurate.
Instead of manually generating csv's, I've imported csv and utilized the writerow method (and quoting options)
Code below, hope this helps:
import csv
import os
from collections import Counter
from re import findall,sub
# name and location of output file
str_output_file = '../data/word_count.txt'
# the list where all the words will be placed
list_file_data = '../data/phrases.txt'
if not os.path.exists(list_file_data):
raise OSError('File {} does not exist.'.format(list_file_data))
with open(list_file_data, 'r') as file_r_data:
str_file_data =
# find all the words and put them into a list
list_all_words = findall('\w+',str_file_data)
# dictionary with all the times a word has been used
counter_word_count = Counter(list_all_words)
with open(str_output_file, 'w') as output_file:
fieldnames = ['word', 'freq']
writer = csv.writer(output_file, quoting=csv.QUOTE_ALL)
for key, value in counter_word_count.iteritems():
output_row = [key, value]
Something like this?
from collections import Counter
from glob import glob
def extract_words_from_line(s):
# make this as complicated as you want for extracting words from a line
return s.strip().split()
tally = sum(
for infile in glob('../data/*.data')
for line in open(infile)),
for k in sorted(tally, key=tally.get, reverse=True):
print k, tally[k]

Need some help in deleting the data from list and again append it in same list

I have developed a Django app where user can upload multiple files. I can upload all the multiple files and its paths in the form of a list separated by comma(,) in MySql database.For example I have uploaded three files
Logging a Defect.docx,
2.Mocks (1).pptx and
and it gets stored in database as following( Converting the individual file path into list and joining all the paths results in following form) :
FileStore/client/Logging a Defect.docx,FileStore/client/Mocks (1).pptx,FileStore/client/Mocksv2.pptx,
Now I need help while deleting particular file. For example when I'm deleting Logging a Defect.docx then I should be deleting first element of list alone and retain the other two paths. I'll be sending only name of document.
I'm retrieving the path as list and then I have to check if the name of doc being passed is there in each element of the list and if it matches then I should delete that element keeping the other elements intact. How to approach this ? It sounds like more of python question than Django question.
Use list-expression to filter the splitted text, and rebuild the string using join function
>>> db_path = 'FileStore/client/Logging a Defect.docx,FileStore/client/Mocks (1).pptx,FileStore/client/Mocksv2.pptx'
>>> file_to_delete = 'Logging a Defect.docx'
>>> file_separator = ","
>>> new_db_path = [
... path.strip()
... for path in db_path.split(file_separator)
... if path.strip() and file_to_delete not in path
... ]
>>> string_to_save = file_separator.join(new_db_path)
>>> string_to_save
'FileStore/client/Mocks (1).pptx,FileStore/client/Mocksv2.pptx'
You can read the text in your database and then use remove method of the list in python and then write back the new value into databse:
text = "FileStore/client/Logging a Defect.docx,FileStore/client/Mocks (1).pptx,FileStore/client/Mocksv2.pptx,"
splitted = text.split(',')
#filename is the one you want to delete
entry = "FileStore/client/{filename}".format(filename="Mocks (1).pptx")
if entry in splitted:
newtext = ""
for s in splitted:
newtext += s
newtext += ','
now write back newtext to database
Not boasting or anything but I came up with my own logic for my question. It looks far less complicated but it works fine.
db_path = 'FileStore/client/Logging a Defect.docx,FileStore/client/Mocks (1).pptx,FileStore/client/Mocksv2.pptx'
path_list = db_path.split(",")
doc = 'Logging a Defect.docx'
for i in path_list :
if doc in i:
new_path = ",".join(y)
print new_path
