Python 3: Getting information from list in list

Python 3: Getting information from list in list - python

I have txt file in Python 3 like this (cities are just examples):
Tokyo 0 267 308 211 156 152 216 27 60 70 75
London 267 0 155 314 111 203 101 258 254 199 310
Paris 308 155 0 429 152 315 216 330 295 249 351
Vienna 211 314 429 0 299 116 212 184 271 265 252
Tallinn 156 111 152 299 0 183 129 178 143 97 199
Helsinki 152 203 313 116 183 0 99 126 212 151 193
Stockholm 216 101 216 212 129 99 0 189 252 161 257
Moscow 27 258 330 184 178 126 189 0 87 73 68
Riga 60 254 295 271 143 212 252 87 0 91 71
Melbourne 70 199 249 265 97 151 161 73 91 0 128
Oslo 75 310 351 252 199 193 257 68 71 128 0
I want to get program to work like this with an example:
Please enter starting point: Paris
Now please enter ending point: Riga
Distance between Paris and Riga is 295 km.
I'm fairly new in Python and I don't know how to read distance list in list.
What I managed to do so far:
cities = []
distances = []
file = open("cities.txt")
for city_info in file:
city_info = city_info.strip()
city = city_info.split()
cities.append(city[0])
distances2 = []
for dist in city[1:]:
distances2.append(int(dist))
distances.append(distances2)
# to check, if lists are good to go
print(distances)
print(cities)
file.close()
amount = len(cities)
for x in range(amount):
for y in range(amount):
startpoint = cities[x]
endpoint = cities[y]
dist1 = distances[x][y]
startpoint = input("Enter start point: ").capitalize()
if startpoint not in cities:
print("Start point doesn't exist in our database: ", startpoint)
else:
endpoint = input("Enter end point: ").capitalize()
if endpoint not in cities:
print("Start point doesn't exist in our database: ", endpoint)
else:
print("Distance between", startpoint, "and", endpoint, "is", dist1, "kilometers.")
As I'm not very competent in Python language, I don't know what I'm doing wrong.
For example I want to get distance between cities[1] and cities[4], so it should find distance from distances[1][4].

Try this:
# reading from file:
with open('cities.txt') as f:
lines = f.readlines()
# pre-processing
indices = {line.split()[0]: i for i, line in enumerate(lines)}
distances = [line.split()[1:] for line in lines]
#user input:
start = input("Please enter starting point: ")
end = input("Now please enter ending point: ")
# evaluation:
distance = distances[indices[start]][indices[end]]
# output:
print("Distance between {start} and {end} is {distance} km.".format(**locals()))

Another approach, not too different from the other answer.
cities = {}
with open('cities.txt') as f:
for i, line in enumerate(f.read().splitlines()):
vals = line.split()
cities[vals[0]] = {'index': i, 'distances': [int(i) for i in vals[1:]]}
startpoint = input("Enter start point: ").capitalize()
if startpoint in cities:
endpoint = input("Enter end point: ").capitalize()
if endpoint in cities:
index = cities[startpoint]['index']
distance = cities[endpoint]['distances'][index]
print('The distance from %s to %s is %d' % (startpoint, endpoint, distance))
else:
print('city %s does not exist' % endpoint)
else:
print('city %s does not exist' % startpoint)

Related

Finding Common Elements (Amazon SDE-1)

Given two lists V1 and V2 of sizes n and m respectively. Return the list of elements common to both the lists and return the list in sorted order. Duplicates may be there in the output list.
Link to the problem : LINK
Example:
Input:
5
3 4 2 2 4
4
3 2 2 7
Output:
2 2 3
Explanation:
The first list is {3 4 2 2 4}, and the second list is {3 2 2 7}.
The common elements in sorted order are {2 2 3}
Expected Time complexity : O(N)
My code:
class Solution:
def common_element(self,v1,v2):
dict1 = {}
ans = []
for num1 in v1:
dict1[num1] = 0
for num2 in v2:
if num2 in dict1:
ans.append(num2)
return sorted(ans)
Problem with my code:
So the accessing time in a dictionary is constant and hence my time complexity was reduced but one of the hidden test cases is failing and my logic is very simple and straight forward and everything seems to be on point. What's your take? Is the logic wrong or the question desc is missing some vital details?
New Approach
Now I am generating two hashmaps/dictionaries for the two arrays. If a num is present in another array, we check the min frequency and then appending that num into the ans that many times.
class Solution:
def common_element(self,arr1,arr2):
dict1 = {}
dict2 = {}
ans = []
for num1 in arr1:
dict1[num1] = 0
for num1 in arr1:
dict1[num1] += 1
for num2 in arr2:
dict2[num2] = 0
for num2 in arr2:
dict2[num2] += 1
for number in dict1:
if number in dict2:
minFreq = min(dict1[number],dict2[number])
for _ in range(minFreq):
ans.append(number)
return sorted(ans)
The code is outputting nothing for this test case
Input:
64920
83454 38720 96164 26694 34159 26694 51732 64378 41604 13682 82725 82237 41850 26501 29460 57055 10851 58745 22405 37332 68806 65956 24444 97310 72883 33190 88996 42918 56060 73526 33825 8241 37300 46719 45367 1116 79566 75831 14760 95648 49875 66341 39691 56110 83764 67379 83210 31115 10030 90456 33607 62065 41831 65110 34633 81943 45048 92837 54415 29171 63497 10714 37685 68717 58156 51743 64900 85997 24597 73904 10421 41880 41826 40845 31548 14259 11134 16392 58525 3128 85059 29188 13812.................
Its Correct output is:
4 6 9 14 17 19 21 26 28 32 33 42 45 54 61 64 67 72 77 86 93 108 113 115 115 124 129 133 135 137 138 141 142 144 148 151 154 160 167 173 174 192 193 195 198 202 205 209 215 219 220 221 231 231 233 235 236 238 239 241 245 246 246 247 254 255 257 262 277 283 286 290 294 298 305 305 307 309 311 312 316 319 321 323 325 325 326 329 329 335 338 340 341 350 353 355 358 364 367 369 378 385 387 391 401 404 405 406 406 410 413 416 417 421 434 435 443 449 452 455 456 459 460 460 466 467 469 473 482 496 503 .................
And Your Code's output is:

Please find the below solution
def sorted_common_elemen(v1, v2):
res = []
for elem in v2:
res.append(elem)
v1.pop(0)
return sorted(res)

Your code ignores the number of times a given element occurs in the list. I think this is a good way to fix that:
class Solution:
def common_element(self, l0, l1):
li = []
for i in l0:
if i in l1:
l1.remove(i)
li.append(i)
return sorted(li)

How to use certain rows of a dataframe in a formula

So I have multiple data frames and all need the same kind of formula applied to certain sets within this data frame. I got the locations of the sets inside the df, but I don't know how to access those sets.
This is my code:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt #might used/need it later to check the output
df = pd.read_csv('Dalfsen.csv')
l = []
x = []
y = []
#the formula(trendline)
def rechtzetten(x,y):
a = (len(x)*sum(x*y)- sum(x)*sum(y))/(len(x)*sum(x**2)-sum(x)**2)
b = (sum(y)-a*sum(x))/len(x)
y1 = x*a+b
print(y1)
METING = df.ID.str.contains("<METING>") #locating the sets
indicatie = np.where(METING == False)[0] #and saving them somewhere
if n in df[n] != indicatie & n+1 != indicatie: #attempt to add parts of the set in l
append.l
elif n in df[n] != indicatie & n+1 == indicatie: #attempt defining the end of the set and using the formula for the set
append.l
rechtzetten(l.x, l.y)
else: #emptying the storage for the new set
l = []
indicatie has the following numbers:
0 12 13 26 27 40 41 53 54 66 67 80 81 94 95 108 109 121
122 137 138 149 150 162 163 177 178 190 191 204 205 217 218 229 230 242
243 255 256 268 269 291 292 312 313 340 341 373 374 401 402 410 411 420
421 430 431 449 450 468 469 487 488 504 505 521 522 538 539 558 559 575
576 590 591 604 605 619 620 633 634 647
Because my df looks like this:
ID,NUM,x,y,nap,abs,end
<PROFIEL>not used data
<METING>data</METING>
<METING>data</METING>
...
<METING>data</METING>
<METING>data</METING>
</PROFIEL>,,,,,,
<PROFIEL>not usde data
...
</PROFIEL>,,,,,,
tl;dr I'm trying to use a formula in each profile as shown above. I want to edit the data between 2 numbers of the list indicatie.
For example:
the fucntion rechtzetten(x,y) for the x and y df.x&df.y[1:11](Because [0]&[12] are in the list indicatie.) And then the same for [14:25] etc. etc.
What I try to avoid is typing the following hundreds of times manually:
x_#=df.x[1:11]
y_#=df.y[1:11]
rechtzetten(x_#,y_#)

I cant understand your question clearly, but if you want to replace a specific column of your pandas dataframe with a numpy array, you could simply assign it :
df['Column'] = numpy_array
Can you be more clear ?

How to print increasing order in dictionary's?

My program is a calorie counter that reads the food and calories from a text file and the name + the foods the person ate. At the end of the program it should output the names + total calories (starting with lowest first). The output values are printing correct, although they aren't in the correct order.
Anyone know why this is happening?
import sys
file = "foods.txt"
line = sys.stdin
fridge = {}
with open(file, "r") as f:
for a in f:
a = a.strip().split()
food = " ".join(a[:-1])
calorie = a[-1]
fridge[food] = calorie
for i in line:
i = i.strip().split(",")
name = i[0]
foods = i[1:]
total_calories = 0
for k in foods:
calorie = fridge.get(k)
if k in fridge:
total_calories += int(calorie)
print("{} : {}".format(name, total_calories))
#My_Output
#joe : 2375
#mary : 785
#sandy : 2086
#trudy : 875
#Expected_Output
#trudy : 875
#mary : 985
#sandy : 2186
#joe : 2375
#foods.txt
#almonds 795
#apple pie 405
#asparagus 15
#avocdo 340
#banana 105
#blackberries 75
#blue cheese 100
#blueberries 80
#muffin 135
#blueberry pie 380
#broccoli 40
#butter 100
#cabbage 15
#carrot cake 385
#cheddar cheese 115
#cheeseburger 525
#cherry pie 410
#chicken noodle soup 75
#chocolate chip cookie 185
#cola 160
#cranberry juice 145
#croissant 235
#danish pastry 235
#egg 75
#grapefruit juice 95
#ice cream 375
#lamb 315
#lemon meringue pie 355
#lettuce 5
#macadamia nuts 960
#mayonnaise 100
#mixed grain bread 65
#orange juice 110
#potatoes 120
#pumpkin pie 320
#rice 230
#salmon 150
#spaghetti 190
#spinach 55
#strawberries 45
#taco 195
#tomatoes 25
#tuna 135
#veal 230
#waffles 205
#watermelon 50
#white bread 65
#wine 75
#yogurt 230
#zuchini 16
#sys.stdin
#joe,almonds,almonds,blue cheese,cabbage,mayonnaise,cherry pie,cola
#mary,apple pie,avocado,broccoli,butter,danish pastry,lettuce,apple
#sandy,zuchini,yogurt,veal,tuna,taco,pumpkin pie,macadamia nuts,brazil nuts
#trudy,waffles,waffles,waffles,chicken noodle soup,chocolate chip cookie

In the last statement of your code, instead of printing append the values to a list predefined using list.append((name,totalcalories)).
Later define a takeSecond function
def takeSecond(elem):
return elem[1]
and sort the list using
l.sort(key=takeSecond)
Now you may get your required result.

How to make a column of numbers increase from a certain value in python

I have a txt file like this:
127 181
151 188
120 201
148 207
148 212
145 215
86 219
108 219
67 239
And I want to the second column of numbers is added in order from 180, and the repeated number is added only once.
My expected results are as follows:
127 180
151 181
120 182
148 183
148 184
145 185
86 186
108 186
67 187
Can someone give me some advice?Thanks.

If you are open to use pandas:
df = pd.read_csv('textfile.txt', header=None, sep=' ')
startvalue = 180
df[1] = np.arange(startvalue, startvalue+len(df)) - df[1].duplicated().cumsum()
df.to_csv('textfile_out.txt', sep=' ', index=False, header=False)
Full example (with imports and textfile-creation):
import pandas as pd
import numpy as np
with open('textfile.txt', 'w') as f:
f.write('''\
127 181
151 188
120 201
148 207
148 212
145 215
86 219
108 219
67 239''')
df = pd.read_csv('textfile.txt', header=None, sep=' ')
startvalue = 180
df[1] = np.arange(startvalue, startvalue+len(df)) - df[1].duplicated().cumsum()
df.to_csv('textfile_out.txt', sep=' ', index=False, header=False)
Output:
127 180
151 181
120 182
148 183
148 184
145 185
86 186
108 186
67 187

Without using any library, I suggest this approach. Create a dictionary to store the relation (old value - new value) and iterate over column values.
n = 180
new_dict = {}
for index, value in enumerate(column):
if value in new_dict.keys():
column[index] = new_dict[value]
else:
new_dict[value] = n
column[index] = n
n += 1

Index Error when using python to read BLAST output in csv format

Apologies for the long question, I have been trying to solve this bug but I cant work out what Im doing wrong! I have included an example of the data so you can see what Im working with.
I have data output from a BLAST search as below:
# BLASTN 2.2.29+
# Query: Cryptocephalus androgyne
# Database: SANfive
# Fields: query id subject id % identity alignment length mismatches gap opens q. start q. end s. start s. end evalue bit score
# 7 hits found
Cryptocephalus M00964:19:000000000-A4YV1:1:2110:23842:21326 99.6 250 1 0 125 374 250 1 1.00E-128 457
Cryptocephalus M00964:19:000000000-A4YV1:1:1112:19704:18005 85.37 246 36 0 90 335 246 1 4.00E-68 255
Cryptocephalus M00964:19:000000000-A4YV1:1:2106:14369:15227 77.42 248 50 3 200 444 245 1 3.00E-34 143
Cryptocephalus M00964:19:000000000-A4YV1:1:2102:5533:11928 78.1 137 30 0 3 139 114 250 2.00E-17 87.9
Cryptocephalus M00964:19:000000000-A4YV1:1:1110:28729:12868 81.55 103 19 0 38 140 104 2 6.00E-17 86.1
Cryptocephalus M00964:19:000000000-A4YV1:1:1113:11427:16440 78.74 127 27 0 3 129 124 250 6.00E-17 86.1
Cryptocephalus M00964:19:000000000-A4YV1:1:2110:12170:20594 78.26 115 25 0 3 117 102 216 1.00E-13 75
# BLASTN 2.2.29+
# Query: Cryptocephalus aureolus
# Database: SANfive
# Fields: query id subject id % identity alignment length mismatches gap opens q. start q. end s. start s. end evalue bit score
# 10 hits found
Cryptocephalus M00964:19:000000000-A4YV1:1:2111:20990:19930 97.2 250 7 0 119 368 250 1 1.00E-118 424
Cryptocephalus M00964:19:000000000-A4YV1:1:1105:20676:23942 86.89 206 27 0 5 210 209 4 7.00E-61 231
Cryptocephalus M00964:19:000000000-A4YV1:1:1113:6534:23125 97.74 133 3 0 1 133 133 1 3.00E-60 230
Cryptocephalus M00964:21:000000000-A4WJV:1:2104:11955:19015 89.58 144 15 0 512 655 1 144 2.00E-46 183
Cryptocephalus M00964:21:000000000-A4WJV:1:1109:14814:10240 88.28 128 15 0 83 210 11 138 2.00E-37 154
Cryptocephalus M00964:21:000000000-A4WJV:1:1105:4530:13833 79.81 208 42 0 3 210 211 4 6.00E-37 152
Cryptocephalus M00964:19:000000000-A4YV1:1:2108:13133:14967 98.7 77 1 0 1 77 77 1 2.00E-32 137
Cryptocephalus M00964:19:000000000-A4YV1:1:1109:14328:3682 100 60 0 0 596 655 251 192 1.00E-24 111
Cryptocephalus M00964:19:000000000-A4YV1:1:1105:19070:25181 100 53 0 0 1 53 53 1 8.00E-21 99
Cryptocephalus M00964:19:000000000-A4YV1:1:1109:20848:27419 100 28 0 0 1 28 28 1 6.00E-07 52.8
# BLASTN 2.2.29+
# Query: Cryptocephalus cynarae
# Database: SANfive
# Fields: query id subject id % identity alignment length mismatches gap opens q. start q. end s. start s. end evalue bit score
# 2 hits found
Cryptocephalus M00964:21:000000000-A4WJV:1:2107:12228:15885 90.86 175 16 0 418 592 4 178 5.00E-62 235
Cryptocephalus M00964:21:000000000-A4WJV:1:1110:20463:5044 84.52 168 26 0 110 277 191 24 2.00E-41 167
and I have saved this as a csv, again shown below
# BLASTN 2.2.29+,,,,,,,,,,,
# Query: Cryptocephalus androgyne,,,,,,,,,,,
# Database: SANfive,,,,,,,,,,,
# Fields: query id, subject id, % identity, alignment length, mismatches, gap opens, q. start, q. end, s. start, s. end, evalue, bit score
# 7 hits found,,,,,,,,,,,
Cryptocephalus,M00964:19:000000000-A4YV1:1:2110:23842:21326,99.6,250,1,0,125,374,250,1,1.00E-128,457
Cryptocephalus,M00964:19:000000000-A4YV1:1:1112:19704:18005,85.37,246,36,0,90,335,246,1,4.00E-68,255
Cryptocephalus,M00964:19:000000000-A4YV1:1:2106:14369:15227,77.42,248,50,3,200,444,245,1,3.00E-34,143
Cryptocephalus,M00964:19:000000000-A4YV1:1:2102:5533:11928,78.1,137,30,0,3,139,114,250,2.00E-17,87.9
Cryptocephalus,M00964:19:000000000-A4YV1:1:1110:28729:12868,81.55,103,19,0,38,140,104,2,6.00E-17,86.1
Cryptocephalus,M00964:19:000000000-A4YV1:1:1113:11427:16440,78.74,127,27,0,3,129,124,250,6.00E-17,86.1
Cryptocephalus,M00964:19:000000000-A4YV1:1:2110:12170:20594,78.26,115,25,0,3,117,102,216,1.00E-13,75
# BLASTN 2.2.29+,,,,,,,,,,,
# Query: Cryptocephalus aureolus,,,,,,,,,,,
# Database: SANfive,,,,,,,,,,,
# Fields: query id, subject id, % identity, alignment length, mismatches, gap opens, q. start, q. end, s. start, s. end, evalue, bit score
# 10 hits found,,,,,,,,,,,
Cryptocephalus,M00964:19:000000000-A4YV1:1:2111:20990:19930,97.2,250,7,0,119,368,250,1,1.00E-118,424
Cryptocephalus,M00964:19:000000000-A4YV1:1:1105:20676:23942,86.89,206,27,0,5,210,209,4,7.00E-61,231
Cryptocephalus,M00964:19:000000000-A4YV1:1:1113:6534:23125,97.74,133,3,0,1,133,133,1,3.00E-60,230
Cryptocephalus,M00964:21:000000000-A4WJV:1:2104:11955:19015,89.58,144,15,0,512,655,1,144,2.00E-46,183
Cryptocephalus,M00964:21:000000000-A4WJV:1:1109:14814:10240,88.28,128,15,0,83,210,11,138,2.00E-37,154
Cryptocephalus,M00964:21:000000000-A4WJV:1:1105:4530:13833,79.81,208,42,0,3,210,211,4,6.00E-37,152
Cryptocephalus,M00964:19:000000000-A4YV1:1:2108:13133:14967,98.7,77,1,0,1,77,77,1,2.00E-32,137
Cryptocephalus,M00964:19:000000000-A4YV1:1:1109:14328:3682,100,60,0,0,596,655,251,192,1.00E-24,111
Cryptocephalus,M00964:19:000000000-A4YV1:1:1105:19070:25181,100,53,0,0,1,53,53,1,8.00E-21,99
Cryptocephalus,M00964:19:000000000-A4YV1:1:1109:20848:27419,100,28,0,0,1,28,28,1,6.00E-07,52.8
I have designed a short script that goes through the percentage identity and if it is above a threshold finds the queryID and adds it to a list before removing duplicates from the list.
import csv
from pylab import plot,show
#Making a function to see if a string is a number or not
def is_number(s):
try:
float(s)
return True
except ValueError:
return False
#Importing the CSV file, using sniffer to check the delimiters used
#In the first 1024 bytes
ImportFile = raw_input("What is the name of your import file? ")
csvfile = open(ImportFile, "rU")
dialect = csv.Sniffer().sniff(csvfile.read(1024))
csvfile.seek(0)
reader = csv.reader(csvfile, dialect)
#Finding species over 98%
Species98 = []
Species95to97 = []
Species90to94 = []
Species85to89 = []
Species80to84 = []
Species75to79 = []
SpeciesBelow74 = []
for line in reader:
if is_number(line[2])== True:
if float(line[2])>=98:
Species98.append(line[0])
elif 97>=float(line[2])>=95:
Species95to97.append(line[0])
elif 94>=float(line[2])>=90:
Species90to94.append(line[0])
elif 89>=float(line[2])>=85:
Species85to89.append(line[0])
elif 84>=float(line[2])>=80:
Species80to84.append(line[0])
elif 79>=float(line[2])>=75:
Species75to79.append(line[0])
elif float(line[2])<=74:
SpeciesBelow74.append(line[0])
def f7(seq):
seen = set()
seen_add = seen.add
return [ x for x in seq if x not in seen and not seen_add(x)]
Species98=f7(Species98)
print len(Species98), "species over 98"
Species95to97=f7(Species95to97) #removing duplicates
search_set = set().union(Species98)
Species95to97 = [x for x in Species95to97 if x not in search_set]
print len(Species95to97), "species between 95-97"
Species90to94=f7(Species90to94)
search_set = set().union(Species98, Species95to97)
Species90to94 = [x for x in Species90to94 if x not in search_set]
print len(Species90to94), "species between 90-94"
Species85to89=f7(Species85to89)
search_set = set().union(Species98, Species95to97, Species90to94)
Species85to89 = [x for x in Species85to89 if x not in search_set]
print len(Species85to89), "species between 85-89"
Species80to84=f7(Species80to84)
search_set = set().union(Species98, Species95to97, Species90to94, Species85to89)
Species80to84 = [x for x in Species80to84 if x not in search_set]
print len(Species80to84), "species between 80-84"
Species75to79=f7(Species75to79)
search_set = set().union(Species98, Species95to97, Species90to94, Species85to89,Species80to84)
Species75to79 = [x for x in Species75to79 if x not in search_set]
print len(Species75to79), "species between 75-79"
SpeciesBelow74=f7(SpeciesBelow74)
search_set = set().union(Species98, Species95to97, Species90to94, Species85to89,Species80to84, Species75to79)
SpeciesBelow74 = [x for x in SpeciesBelow74 if x not in search_set]
print len(SpeciesBelow74), "species below 74"
#Finding species 95-97%
The script works perfectly most of the time but every so often I get the error shown below
File "FindingSpeciesRepresentation.py", line 35, in <module>
if is_number(line[2])== "True":
IndexError: list index out of range
But if I change the script so it prints line[2] it prints all the identities as I would expect. Do you have any idea what could be going wrong? Again apologies for the wall of data.
This has been partly taken from my earlier question: Extracting BLAST output columns in CSV form with python

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python 3: Getting information from list in list - python

Related

Finding Common Elements (Amazon SDE-1)

How to use certain rows of a dataframe in a formula

How to print increasing order in dictionary's?

How to make a column of numbers increase from a certain value in python

Index Error when using python to read BLAST output in csv format

Categories

Resources