Creating a list of lists using append function - python

I have two functions I have created as follows
def load_dates(stations):
f = open(stations[0] + '.txt', 'r')
dates = []
for line in f:
dates.append(line.split()[0])
f.close()
return dates
stations = load_stations("stations.txt")
dates = load_dates(stations)
and
def load_station_data(station):
f = open(stations[0] + '.txt', 'r')
temp = []
for line in f:
x = (line.split()[1])
x = x.strip()
temp.append(x)
f.close()
return temp
The first function retrieves dates from a list in a seperate file (hence openfile function) which can be seen to be the first column and the second retrieves the temperatures whilst eliminating the spaces. The second function however goes and gets the temperatures from a specific file (station).
Dates Temp
19600101 29.2
19600102 29.4
19600103 29.5
The question I have now is how I could make my new function display the list of data for temp inside a corresponding list for different station files
for example there is a list of temperatures that belong to every station(city). I know what I have to do is create an empty list keep iterating through the the stations using a for loop and then add what i iterated throughout
the empty lists using the append function. I am new to python and so am struggling with the part said above

Instead of using lists, it's better to use dictionnaries here.
#" = {}" create a dictionnary
cities = {}
#put the files you want to parse in this list
file_list = ("city1.txt", "city2.txt")
for file_name in file_list:
file = open(file_name, 'r')
#we don't want the .txt in the name, so we'll cut it
#split splits the file_name into a list, removing all dots
city_name = file_name.split('.')[0]
#"= []" create an empty list
cities[city_name] = []
for line in file:
#strip will remove all unnecessary spaces
values = line.strip().strip(' ')
#verify the length, we don't want to use empty lines
if len(values) == 2:
cities[city_name] = values[1]
file.close()
I hope this will do what you want
Edit:
All the cities and the values are now in the dictionnary 'cities', if you want to access a specific city's temps, you can do it like that
print(cities["cityname"])
and if you want to read all data, you can print the whole dict
for key, temperatures in cities.iteritems():
print("City: " + key)
for temperature in temperatures:
print("\t" + temperature)

agree with #Morb that a dict sounds more sensible but in answer to your original question you can certainly append a list to a list (as opposed to extend) So, say each line in your file was like:
19600101 29.2 28.4 25.6 30.2
19600102 26.2 24.4 21.6 30.5
you could
temp.append(line.split()[1:])
and end up with a list of lists
[['29.2', '28.4', '25.6', '30.2'],['26.2', '24.4', '21.6', '30.5']]

I am not sure I get the problem either, but maybe you should:
Do only one loop to get both temperature and date:
def load_all_data(stations):
f = open(stations[0] + '.txt')
dates, temps = [], []
for line in f.readlines():
dates.append(line.split()[0])
temps.append(line.split()[1].strip())
f.close()
return dates, temps
use list comprehension:
def load_all_data(stations):
f = open(stations[0] + '.txt'):
dates = [line.split()[0] for line in f.readlines()]
temps = [line.split()[1].split() for line in f.readlines()]
f.close()
return dates, temps
use context manager for open as suggested by cool_jesus:
def load_data_all(stations):
with open(stations[0] + '.txt') as f:
dates = [line.split()[0] for line in f.readlines()]
temps = [line.split()[1].split() for line in f.readlines()]
return dates, temps
do a loop on stations :
def load_data_all(stations):
data_stations = []
for station in stations:
with open(station + '.txt') as f:
dates = [line.split()[0] for line in f.readlines()]
temps = [line.split()[1].split() for line in f.readlines()]
data_stations.append((temps, dates))
return data_stations

Related

Making dictionary in dictionary to separate data by the same values in one column and then from second column

I am new in Python and I am stuck with one problem for a few days now. I made a script that:
-takes data from CSV file -sort it by same values in first column of data file
-instert sorted data in specifield line in different template text file
-save the file in as many copies as there are different values in first column from data file This picture below show how it works:
But there are two more things I need to do. When in separate files as showed above, there are some of the same values from second column of the data file, then this file should insert value from third column instead of repeating the same value from second column. On the picture below I showed how it should look like:
What I also need is to add somewhere separeted value of first column from data file by "_".
There is datafile:
111_0,3005,QWE
111_0,3006,SDE
111_0,3006,LFR
111_1,3005,QWE
111_1,5345,JTR
112_0,3103,JPP
112_0,3343,PDK
113_0,2137,TRE
113_0,2137,OMG
and there is code i made:
import shutil
with open("data.csv") as f:
contents = f.read()
contents = contents.splitlines()
values_per_baseline = dict()
for line in contents:
key = line.split(',')[0]
values = line.split(',')[1:]
if key not in values_per_baseline:
values_per_baseline[key] = []
values_per_baseline[key].append(values)
for file in values_per_baseline.keys():
x = 3
shutil.copyfile("of.txt", (f"of_%s.txt" % file))
filename = f"of_%s.txt" % file
for values in values_per_baseline[file]:
with open(filename, "r") as f:
contents = f.readlines()
contents.insert(x, ' o = ' + values[0] + '\n ' + 'a = ' + values[1] +'\n')
with open(filename, "w") as f:
contents = "".join(contents)
f.write(contents)
f.close()
I have been trying to make something like a dictionary of dictionaries of lists but I can't implement it in correct way to make it works. Any help or suggestion will be much appreciated.
You could try the following:
import csv
from collections import defaultdict
values_per_baseline = defaultdict(lambda: defaultdict(list))
with open("data.csv", "r") as file:
for key1, key2, value in csv.reader(file):
values_per_baseline[key1][key2].append(value)
x = 3
for filekey, content in values_per_baseline.items():
with open("of.txt", "r") as fin,\
open(f"of_{filekey}.txt", "w") as fout:
fout.writelines(next(fin) for _ in range(x))
for key, values in content.items():
fout.write(
f' o = {key}\n'
+ ' a = '
+ ' <COMMA> '.join(values)
+ '\n'
)
fout.writelines(fin)
The input-reading part is using the csv module from the standard library (for convenience) and a defaultdict. The file is read into a nested dictionary.
Content of datafile.csv:
111_0,3005,QWE
111_0,3006,SDE
111_0,3006,LFR
111_1,3005,QWE
111_1,5345,JTR
112_0,3103,JPP
112_0,3343,PDK
113_0,2137,TRE
113_0,2137,OMG
Possible solution is the following:
def nested_list_to_dict(lst):
result = {}
subgroup = {}
if all(len(l) == 3 for l in lst):
for first, second, third in lst:
result.setdefault(first, []).append((second, third))
for k, v in result.items():
for item1, item2 in v:
subgroup.setdefault(item1, []).append(item2.strip())
result[k] = subgroup
subgroup = {}
else:
print("Input data must have 3 items like '111_0,3005,QWE'")
return result
with open("datafile.csv", "r", encoding="utf-8") as f:
content = f.read().splitlines()
data = nested_list_to_dict([line.split(',') for line in content])
print(data)
# ... rest of your code ....
Prints
{'111_0': {'3005': ['QWE'], '3006': ['SDE', 'LFR']},
'111_1': {'3005': ['QWE'], '5345': ['JTR']},
'112_0': {'3103': ['JPP'], '3343': ['PDK']},
'113_0': {'2137': ['TRE', 'OMG']}}

Storing lines into array and print line by line

Current code:
filepath = "C:/Bg_Log/KLBG04.txt"
with open(filepath) as fp:
lines = fp.read().splitlines()
with open(filepath, "w") as fp:
for line in lines:
print("KLBG04",line,line[18], file=fp)
output:
KLBG04 20/01/03 08:09:13 G0001 G
Require flexibility to move the columns around and also manipulate the date as shown below with array or list
KLBG04 03/01/20 G0001 G 08:09:13
You didn't provide sample data, but I think this may work:
filepath = "C:/Bg_Log/KLBG04.txt"
with open(filepath) as fp:
lines = fp.read().splitlines()
with open(filepath, "w") as fp:
for line in lines:
ln = "KLBG04 " + line + " " + line[18] # current column order
sp = ln.split() # split at spaces
dt = '/'.join(sp[1].split('/')[::-1]) # reverse date
print(sp[0],dt,sp[3],sp[-1],sp[-2]) # new column order
# print("KLBG04",line,line[18], file=fp)
Try to split() the line first, then print the list in your desired order
from datetime import datetime # use the datetime module to manipulate the date
filepath = "C:/Bg_Log/KLBG04.txt"
with open(filepath) as fp:
lines = fp.read().splitlines()
with open(filepath, "w") as fp:
for line in lines:
date, time, venue = line.split(" ") # split the line up
date = datetime.strptime(date, '%y/%m/%d').strftime('%d/%m/%y') # format your date
print("KLBG04", date, venue, venue[0], time, file=fp) # print in your desired order
Why don't you store the output as a string itself and use the split() method to split the string at each space and then use another split method for the index 1 (The index that will contain the date) and split it again at each / (So that you can then manipulate the date around).
for line in lines:
String output ="KLBG04",line,line[18], file=fp # Rather than printing store the output in a string #
x = output.split(" ")
date_output = x[1].split("/")
# Now you can just manipulate the data around and print how you want to #
Try this
`
for line in lines:
words = line.split() # split every word
date_values = words[0].split('/') # split the word that contains date
#create a dictionary as follows
date_format = ['YY','DD','MM']
date_dict = dict(zip(date_format, date_values))
#now create a new variable with changed format
new_date_format = date_dict['MM'] + '/' + date_dict['DD'] + '/' + date_dict['YY']
print(new_date_format)
#replace the first word [index 0 is having date] with new date format
words[0] = new_date_format
#join all the words to form a new line
new_line = ' '.join(words)
print("KLBG04",new_line,line[18])
`

Sorted Lists not working properly in python

I'm sorting a csv file with 14 titles and 500,000 records in it. I organized them using the enumerate function. However, when I use the sorted() function to sort them using a key value (ex:total profit), it only returns numbers under 100,000 (ie. 99,992.36, when in reality some values go into the millions).
When I switch to a different key value (ex:total cost), I run into the same issue, however if that specific record happened to have a total profit over 100,000, that value does show. So I think I've narrowed it down to my sorted() function.
def processStats(originalList, header):
#sorting in descending order
sortedListByTotalProfit = sorted(originalList, key = operator.itemgetter(11), reverse = True)
max_item = sortedListByTotalProfit[0]
print(max_intem)
def main():
fileName = 'Records.csv'
records = []
recordHeader = []
with open(fileName) as f:
lines = f.readlines()
for i, line in enumerate(lines):
if i == 0: #first line is the header, store it in the list by splitting the first record by comma
recordHeader=line.split(',')
continue
records.append(line.split(",")) #takes each record in the file and stores elements separated by commas as elements of a list
processStats(records, recordHeader)
I guess the data type is string.
You can use
records.append(list(map(int, line.split(",")))) to replace records.append(line.split(","))
all code:
def processStats(originalList, header):
#sorting in descending order
sortedListByTotalProfit = sorted(originalList, key = operator.itemgetter(11), reverse = True)
max_item = sortedListByTotalProfit[0]
print(max_intem)
def main():
fileName = 'Records.csv'
records = []
recordHeader = []
with open(fileName) as f:
lines = f.readlines()
for i, line in enumerate(lines):
if i == 0: #first line is the header, store it in the list by splitting the first record by comma
recordHeader=line.split(',')
continue
records.append(records.append(list(map(int, line.split(","))))) #takes each record in the file and stores elements separated by commas as elements of a list
processStats(records, recordHeader)

Reading a list, splitting it, then making it into 3 new lists

I have a file that includes a list of every zip code, city, and state in the US. When I read a list it looks like " '00501', 'Huntsville', 'NY' ".
So what I'm trying to do in Python is:
Open the file, read everysingle line, split the lines, then create 3 new lists Zip, City, State and place all the data from the original list into the new lists.
So far, I have the following code:
def main():
zipcode = []
city = []
state = []
file_object = open("zipcodes.txt", "r")
theList = file_object.readlines()
splitlist = theList.split(',')
zipcode.append(splitlist[0])
city.append(splitlist[1])
state.append(splitlist[2])
file_object.close()
You have the basics, you are just missing a loop so the process repeats for each line in the file:
theList = file_object.readlines()
for line in theList:
splitlist = line.split(',')
zipcode.append(splitlist[0])
city.append(splitlist[1])
state.append(splitlist[2])
Keep in mind that readlines() returns the entire file so your theList contains the entire file's contents. You just have to loop over it.
Here is a different version you can try:
def main():
zips = []
cities = []
states = []
with open('zipcodes.txt', 'r') as f:
for line in f:
bits = line.split(',')
zips.append(bits[0])
cities.append(bits[1])
states.append(bits[2])
The with_statement is a way to read files. It ensures that files are automatically closed.
The for line in f: loop is what you were missing in your code - this will go through each line.
The body of the for loop is exactly what you have written, so that part you had down well.

Check string in dictionary for keywords from 2 separate lists

I have a dictionary with keys as ID and values as string. I also have two separate lists with keywords.
I need to filter out all keys in the dictionary, the values of which have atleast one keyword from list 1 and atleast one keyword from list 2.
I am confused how to go about this. please help.
So far, this is what I have:
# code loads all data from al.csv into a dictionary where the key is column 1 which is tweet ID and value is the the whole row including tweet ID.
reader = csv.reader(open('al.csv', 'r'))
overallDict = {}
for rows in reader:
k = rows[0]
v = rows[0] + ',' + rows[1] + ',' + rows[2] + ',' + rows[3] + ',' + rows[4] + ',' + rows[5] + ',' + rows[6] + ',' + rows[7] + ',' + rows[8] + ',' + rows[9]
overallDict[k] = v
# The following lines of code loop loads keywords list
with open('slangNames.txt') as f:
slangs = f.readlines()
# To strip new-line and prepare data into finished keywords list
strippedSlangs = []
for elements in slangs:
elements = elements.strip()
strippedSlangs.append(elements)
# The following lines of code loop loads risks list
with open('riskNames.txt') as f:
risks = f.readlines()
# To strip new-line and prepare data into finished risks list
strippedRisks = []
for things in risks:
things = things.strip()
strippedRisks.append(things)
Say List1 = [opium, christmas, weed]
and List2 = [drug, harmful, bad]
and Dictionary = {213432:'opium is harmful for health', 321234:'christmas is good', 543678:'weed is bad'}
the desired output needs to be the list:
Output: [213432, 543678] because these two corresponding tweets contain atleast one value from list1 and one from list2.
Firstly, I had to rewrite your code to figure out more easily what it was doing:
strippedRisks = set()
strippedSlangs = set()
overallDict = {}
with open('al.csv', 'r') as f:
reader = csv.reader(f)
for row in reader:
overallDict[row[0]] = ",".join(row[1:])
with open('slangNames.txt') as f:
for line in f:
elements = line.strip()
strippedSlangs.add(elements)
with open('riskNames.txt') as f:
for line in f:
things = line.strip()
strippedRisks.add(things)
Okay. You want to know which keys in your dictionary have values in each of the lists it sounds like? In other words, you want to know which values of the dictionary have a word that is disallowed.
You can probably do something like this:
for key, value in overallDict.items():
if set(value.split(',')).intersection(strippedSlangs):
# some words appear in strippedSlangs
elif set(value.split(',')).intersection(strippedRisks)
# some words appear in strippedRisks
However, now that I see what you want to do, I'd just use sets from the beginning and build the not-permitted words first:
strippedRisks = set()
strippedSlangs = set()
overallDict = {}
with open('slangNames.txt') as f:
for line in f:
elements = line.strip()
strippedSlangs.add(elements)
with open('riskNames.txt') as f:
for line in f:
things = line.strip()
strippedRisks.add(things)
with open('al.csv', 'r') as f:
reader = csv.reader(f)
for row in reader:
values = set(row[1:])
if strippredRisks.intersection(values) and strippedSlangs.intersection(values):
# Words in both bad-word lists. Do we skip these or save them?
pass
else:
overallDict[row[0]] = values
I believe that's what you're trying to accomplish, but I'm not totally sure.

Categories