Independent arrays interfered with each other? - python

There's a given data set of two columns: EmployeeCode and Surname.
The format is like:
EmployeeCode[1] = "L001"
Surname[1] = "Pollard"
EmployeeCode[2] = "L002"
Surname[2] = "Wills"
...
What I was trying to do is to sort according to lexicographic order for each column so as to facilitate implementation of binary search later on.
This is my code:
#data set
EmployeeCode, Surname = [0]*33, [0]*33
EmployeeCode[1] = "L001"
Surname[1] = "Pollard"
EmployeeCode[2] = "L002"
Surname[2] = "Wills"
EmployeeCode[3] = "L007"
Surname[3] = "Singh"
EmployeeCode[4] = "L008"
Surname[4] = "Yallop"
EmployeeCode[5] = "L009"
Surname[5] = "Adams"
EmployeeCode[6] = "L013"
Surname[6] = "Davies"
EmployeeCode[7] = "L014"
Surname[7] = "Patel"
EmployeeCode[8] = "L021"
Surname[8] = "Kelly"
EmployeeCode[9] = "S001"
Surname[9] = "Ong"
EmployeeCode[10] = "S002"
Surname[10] = "Goh"
EmployeeCode[11] = "S003"
Surname[11] = "Ong"
EmployeeCode[12] = "S004"
Surname[12] = "Ang"
EmployeeCode[13] = "S005"
Surname[13] = "Wong"
EmployeeCode[14] = "S006"
Surname[14] = "Teo"
EmployeeCode[15] = "S007"
Surname[15] = "Ho"
EmployeeCode[16] = "S008"
Surname[16] = "Chong"
EmployeeCode[17] = "S009"
Surname[17] = "Low"
EmployeeCode[18] = "S010"
Surname[18] = "Sim"
EmployeeCode[19] = "S011"
Surname[19] = "Tay"
EmployeeCode[20] = "S012"
Surname[20] = "Tay"
EmployeeCode[21] = "S013"
Surname[21] = "Chia"
EmployeeCode[22] = "S014"
Surname[22] = "Tan"
EmployeeCode[23] = "S015"
Surname[23] = "Yeo"
EmployeeCode[24] = "S016"
Surname[24] = "Lim"
EmployeeCode[25] = "S017"
Surname[25] = "Tan"
EmployeeCode[26] = "S018"
Surname[26] = "Ng"
EmployeeCode[27] = "S018"
Surname[27] = "Lim"
EmployeeCode[28] = "S019"
Surname[28] = "Toh"
EmployeeCode[29] = "N011"
Surname[29] = "Morris"
EmployeeCode[30] = "N013"
Surname[30] = "Williams"
EmployeeCode[31] = "N016"
Surname[31] = "Chua"
EmployeeCode[32] = "N023"
Surname[32] = "Wong"
#sort based on value of main array
def bubble_sort(main, second):
sort = True
passed = len(main)-1
while sort:
sort = False
i = 2
while i<= passed:
#print(main[i],main[i-1],i)
if main[i] < main[i-1]:
main[i], main[i-1] = main[i-1], main[i]
second[i], second[i-1] = second[i-1], second[i]
sort = True
i += 1
passed -= 1
return main,second
#main
#prepare sorted array for binary search
#for search by surname, sort according to surname
sName,sCode = bubble_sort(Surname,EmployeeCode)
print("**BEFORE******")
for k in range(0,33):
print(sName[k],sCode[k])
print("*BEFORE*******")
#for search by ECode, sort according to ECode
cCode,cName = bubble_sort(EmployeeCode, Surname)
print("**AFTER******")
for k in range(0,33):
print(sName[k],sCode[k])
print("**AFTER******")
However, after the 2nd time sorting, the 1st time sorting result in sName and sCode just changed by themselves. I've never manually changed it.
BEFORE(1st sorting)
**BEFORE******
0 0
Adams L009
Ang S004
Chia S013
Chong S008
Chua N016
Davies L013
Goh S002
Ho S007
Kelly L021
Lim S016
Lim S018
Low S009
Morris N011
Ng S018
Ong S001
Ong S003
Patel L014
Pollard L001
Sim S010
Singh L007
Tan S014
Tan S017
Tay S011
Tay S012
Teo S006
Toh S019
Williams N013
Wills L002
Wong S005
Wong N023
Yallop L008
Yeo S015
*BEFORE*******
AFTER(2nd sorting, see last 4 items)
**AFTER******
0 0
Pollard L001
Wills L002
Singh L007
Yallop L008
Adams L009
Davies L013
Patel L014
Kelly L021
Morris N011
Williams N013
Chua N016
Wong N023
Ong S001
Goh S002
Ong S003
Ang S004
Wong S005
Teo S006
Ho S007
Chong S008
Low S009
Sim S010
Tay S011
Tay S012
Chia S013
Tan S014
Yeo S015
Lim S016
Tan S017
Lim S018
Ng S018
Toh S019
Can anyone tell me how could this happened?

Assignments and argument passing in Python won't ever create copies of objects. When you bubblesort your lists, the lists you pass into the bubble sort are the same exact list objects in memory as the lists the bubble sort passes back out. Surname, sName, and cName are the exact same objects, and when you do the second bubble sort, you modify sName and sCode instead of creating independent, sorted lists.
If you want to copy a list, you have to do so explicitly. This is a shallow copy:
new_list = original[:]
new_list will be a new list containing the same objects original contained.
This is a deep copy:
import copy
new_list = copy.deepcopy(original)
new_list will be a new list containing deep copies of the objects original contained. (This is sometimes too deep; for example, if you have a list of lists, you sometimes don't want to copy the objects inside the inner lists.)
Finally, I'd like to note that your initialization code is painfully verbose. You can use a list literal instead of creating a list full of zeros and assigning each element separately:
EmployeeCode = [
'Pollard',
'Wills',
...
]

Related

Replace dot product for loop Numpy

I am trying to replace the dot product for loop using something faster like NumPy
I did research on dot product and kind of understand and can get it working with toy data in a few ways in but not 100% when it comes to implementing it for actual use with a data frame.
I looked at these and other SO threads to no luck avoide loop dot product, matlab and dot product subarrays without for loop and multiple numpy dot products without a loop
looking to do something like this which works with toy numbers in np array
u1 =np.array([1,2,3])
u2 =np.array([2,3,4])
v1.dot(v2)
20
u1 =np.array([1,2,3])
u2 =np.array([2,3,4])
(u1 * u2).sum()
20
u1 =np.array([1,2,3])
u2 =np.array([2,3,4])
sum([x1*x2 for x1, x2 in zip (u1, u2)])
20
this is the current working get dot product
I would like to do this with out the for loop
def get_dot_product(self, courseid1, courseid2, unit_vectors):
u1 = unit_vectors[courseid1]
u2 = unit_vectors[courseid2]
dot_product = 0.0
for dimension in u1:
if dimension in u2:
dot_product += u1[dimension] * u2[dimension]
return dot_product
** code**
#!/usr/bin/env python
# coding: utf-8
class SearchRecommendationSystem:
def __init__(self):
pass
def get_bag_of_words(self, titles_lines):
bag_of_words = {}
for index, row in titles_lines.iterrows():
courseid, course_bag_of_words = self.get_course_bag_of_words(row)
for word in course_bag_of_words:
word = str(word).strip() # added
if word not in bag_of_words:
bag_of_words[word] = course_bag_of_words[word]
else:
bag_of_words[word] += course_bag_of_words[word]
return bag_of_words
def get_course_bag_of_words(self, line):
course_bag_of_words = {}
courseid = line['courseid']
title = line['title'].lower()
description = line['description'].lower()
wordlist = title.split() + description.split()
if len(wordlist) >= 10:
for word in wordlist:
word = str(word).strip() # added
if word not in course_bag_of_words:
course_bag_of_words[word] = 1
else:
course_bag_of_words[word] += 1
return courseid, course_bag_of_words
def get_sorted_results(self, d):
kv_list = d.items()
vk_list = []
for kv in kv_list:
k, v = kv
vk = v, k
vk_list.append(vk)
vk_list.sort()
vk_list.reverse()
k_list = []
for vk in vk_list[:10]:
v, k = vk
k_list.append(k)
return k_list
def get_keywords(self, titles_lines, bag_of_words):
n = sum(bag_of_words.values())
keywords = {}
for index, row in titles_lines.iterrows():
courseid, course_bag_of_words = self.get_course_bag_of_words(row)
term_importance = {}
for word in course_bag_of_words:
word = str(word).strip() # extra
tf_course = (float(course_bag_of_words[word]) / sum(course_bag_of_words.values()))
tf_overall = float(bag_of_words[word]) / n
term_importance[word] = tf_course / tf_overall
keywords[str(courseid)] = self.get_sorted_results(term_importance)
return keywords
def get_inverted_index(self, keywords):
inverted_index = {}
for courseid in keywords:
for keyword in keywords[courseid]:
if keyword not in inverted_index:
keyword = str(keyword).strip() # added
inverted_index[keyword] = []
inverted_index[keyword].append(courseid)
return inverted_index
def get_search_results(self, query_terms, keywords, inverted_index):
search_results = {}
for term in query_terms:
term = str(term).strip()
if term in inverted_index:
for courseid in inverted_index[term]:
if courseid not in search_results:
search_results[courseid] = 0.0
search_results[courseid] += (
1 / float(keywords[courseid].index(term) + 1) *
1 / float(query_terms.index(term) + 1)
)
sorted_results = self.get_sorted_results(search_results)
return sorted_results
def get_titles(self, titles_lines):
titles = {}
for index, row in titles_lines.iterrows():
titles[row['courseid']] = row['title'][:60]
return titles
def get_unit_vectors(self, keywords, categories_lines):
norm = 1.884
cat = {}
subcat = {}
for line in categories_lines[1:]:
courseid_, category, subcategory = line.split('\t')
cat[courseid_] = category.strip()
subcat[courseid_] = subcategory.strip()
unit_vectors = {}
for courseid in keywords:
u = {}
if courseid in cat:
u[cat[courseid]] = 1 / norm
u[subcat[courseid]] = 1 / norm
for keyword in keywords[courseid]:
u[keyword] = (1 / float(keywords[courseid].index(keyword) + 1) / norm)
unit_vectors[courseid] = u
return unit_vectors
def get_dot_product(self, courseid1, courseid2, unit_vectors):
u1 = unit_vectors[courseid1]
u2 = unit_vectors[courseid2]
dot_product = 0.0
for dimension in u1:
if dimension in u2:
dot_product += u1[dimension] * u2[dimension]
return dot_product
def get_recommendation_results(self, seed_courseid, keywords, inverted_index, unit_vectors):
courseids = []
seed_courseid = str(seed_courseid).strip()
for keyword in keywords[seed_courseid]:
for courseid in inverted_index[keyword]:
if courseid not in courseids and courseid != seed_courseid:
courseids.append(courseid)
dot_products = {}
for courseid in courseids:
dot_products[courseid] = self.get_dot_product(seed_courseid, courseid, unit_vectors)
sorted_results = self.get_sorted_results(dot_products)
return sorted_results
def Final(self):
print("Reading Title file.......")
titles_lines = open('s2-titles.txt', encoding="utf8").readlines()
print("Reading Category file.......")
categories_lines = open('s2-categories.tsv', encoding = "utf8").readlines()
print("Getting Supported Functions Data")
bag_of_words = self.get_bag_of_words(titles_lines)
keywords = self.get_keywords(titles_lines, bag_of_words)
inverted_index = self.get_inverted_index(keywords)
titles = self.get_titles(titles_lines)
print("Getting Unit Vectors")
unit_vectors = self.get_unit_vectors(keywords=keywords, categories_lines=categories_lines)
#Search Part
print("\n ############# Started Search Query System ############# \n")
query = input('Input your search query: ')
while query != '':
query_terms = query.split()
search_sorted_results = self.get_search_results(query_terms, keywords, inverted_index)
print(f"==> search results for query: {query.split()}")
for search_result in search_sorted_results:
print(f"{search_result.strip()} - {str(titles[search_result]).strip()}")
#ask again for query or quit the while loop if no query is given
query = input('Input your search query [hit return to finish]: ')
print("\n ############# Started Recommendation Algorithm System ############# \n")
# Recommendation ALgorithm Part
seed_courseid = (input('Input your seed courseid: '))
while seed_courseid != '':
seed_courseid = str(seed_courseid).strip()
recom_sorted_results = self.get_recommendation_results(seed_courseid, keywords, inverted_index, unit_vectors)
print('==> recommendation results:')
for rec_result in recom_sorted_results:
print(f"{rec_result.strip()} - {str(titles[rec_result]).strip()}")
get_dot_product_ = self.get_dot_product(seed_courseid, str(rec_result).strip(), unit_vectors)
print(f"Dot Product Value: {get_dot_product_}")
seed_courseid = (input('Input seed courseid [hit return to finish]:'))
if __name__ == '__main__':
obj = SearchRecommendationSystem()
obj.Final()
s2-categories.tsv
courseid category subcategory
21526 Design 3D & Animation
153082 Marketing Advertising
225436 Marketing Affiliate Marketing
19482 Office Productivity Apple
33883 Office Productivity Apple
59526 IT & Software Operating Systems
29219 Personal Development Career Development
35057 Personal Development Career Development
40751 Personal Development Career Development
65210 Personal Development Career Development
234414 Personal Development Career Development
Example of how s2-titles.txt looks
courseidXXXYYYZZZtitleXXXYYYZZZdescription
3586XXXYYYZZZLearning Tools for Mrs B's Science Classes This is a series of lessons that will introduce students to the learning tools that will be utilized throughout the schoXXXYYYZZZThis is a series of lessons that will introduce students to the learning tools that will be utilized throughout the school year The use of these tools serves multiple purposes 1 Allow the teacher to give immediate and meaningful feedback on work that is in progress 2 Allow students to have access to content and materials when outside the classroom 3 Provide a variety of methods for students to experience learning materials 4 Provide a variety of methods for students to demonstrate learning 5 Allow for more time sensitive correction grading and reflections on concepts that are assessed
Evidently unit_vectors is a dictionary, from which you extract to 2 values, u1 and u2.
But what are those? Evidently dicts as well (this iteration would not make sense with a list):
for dimension in u1:
if dimension in u2:
dot_product += u1[dimension] * u2[dimension]
But what is u1[dimension]? A list? An array.
Normally dict are access by key as you do here. There isn't a numpy style "vectorization". vals = list(u1.values()) gets a lists of all values, and conceivably that could be made into an array (if the elements are right)
arr1 = np.array(list(u1.values()))
and a np.dot(arr1, arr2) might work
You'll get the best answers if you give small concrete examples - with real working data (and skip the complex generating code). Focus on the core of the problem, so we can grasp the issue with a 30 second read!
===
Looking more in depth at your dot function; this replicates the core (I think). Initially I missed the fact that you aren't iterating on u2 keys, but rather seeking matching ones.
def foo(dd):
x = 0
u1 = dd['u1']
u2 = dd['u2']
for k in u1:
if k in u2:
x += u1[k]*u2[k]
return x
Then making a dictionary of dictionaries:
In [30]: keys=list('abcde'); values=[1,2,3,4,5]
In [31]: adict = {k:v for k,v in zip(keys,values)}
In [32]: dd = {'u1':adict, 'u2':adict}
In [41]: dd
Out[41]:
{'u1': {'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5},
'u2': {'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5}}
In [42]: foo(dd)
Out[42]: 55
In this case the subdictionaries match, so we get the same value with a simple array dot:
In [43]: np.dot(values,values)
Out[43]: 55
But if u2 was different, with different key/value pairs, and possibly different keys the result will be different. I don't see a way around the iterative access by keys. The sum-of-products part of the job is minor compared to the dictionary access.
In [44]: dd['u2'] = {'e':3, 'f':4, 'a':3}
In [45]: foo(dd)
Out[45]: 18
We could construct other data structures that are more suitable to a fast dot like calculation. But that's another topic.
Modified method
def get_dot_product(self, courseid1, courseid2, unit_vectors):
# u1 = unit_vectors[courseid1]
# u2 = unit_vectors[courseid2]
# dimensions = set(u1).intersection(set(u2))
# dot_product = sum(u1[dimension] * u2.get(dimension, 0) for dimension in dimensions)
u1 = unit_vectors[courseid1]
u2 = unit_vectors[courseid2]
dot_product = sum(u1[dimension] * u2.get(dimension, 0) for dimension in u2)
return dot_product

Movie File parse file into a dictionary of the form

1.6. Recommend a Movie
Create a function that counts how many keywords are similar in a set of movie reviews
and recommend the movie with the most similar number of keywords.
The solution to this task will require the use of dictionaries.
The film reviews & keywords are in a file called film_reviews.txt, separated by commas.
The first term is the movie name, the remaining terms are the film’s keyword tags (i.e.,
“amazing", “poetic", “scary", etc.).
Function name: similar_movie()
Parameters/arguments: name of a movie
Returns: a list of movies similar to the movie passed as an argument
film_reviews.txt -
7 Days in Entebbe,fun,foreign,sad,boring,slow,romance
12 Strong,war,violence,foreign,sad,action,romance,bloody
A Fantastic Woman,fun,foreign,sad,romance
A Wrinkle in Time,book,witty,historical,boring,slow,romance
Acts of Violence,war,violence,historical,action
Annihilation,fun,war,violence,gore,action
Armed,foreign,sad,war,violence,cgi,fancy,action,bloody
Black '47,fun,clever,witty,boring,slow,action,bloody
Black Panther,war,violence,comicbook,expensive,action,bloody
I think this could work for you
film_data = {'films': {}}
with open('film_reviews.txt', 'r') as f:
for line in f.readlines():
data = line.split(',')
data[-1] = data[-1].strip() # removing new line character
film_data['films'][data[0].lower()] = data[1:]
def get_smilar_movie(name):
if name.lower() in film_data['films'].keys():
original_review = film_data['films'][name.lower()]
similarities = dict()
for key in film_data['films']:
if key == name.lower():
continue
else:
similar_movie_review = set(film_data['films'][key])
overlap = set(original_review) & similar_movie_review
universe = set(original_review) | similar_movie_review
# % of overlap compared to the first movie = output1
output1 = float(len(overlap)) / len(set(original_review)) * 100
# % of overlap compared to the second movie = output2
output2 = float(len(overlap)) / len(similar_movie_review) * 100
# % of overlap compared to universe
output3 = float(len(overlap)) / len(universe) * 100
similarities[output1 + output2 + output3] = dict()
similarities[output1 + output2 + output3]['reviews'] = film_data['films'][key]
similarities[output1 + output2 + output3]['movie'] = key
max_similarity = max(similarities.keys())
movie2 = similarities[max_similarity]
print(name,' reviews ',film_data['films'][name.lower()])
print('similar movie ',movie2)
print('Similarity = {0:.2f}/100'.format(max_similarity/3))
return movie2['movie']
return None
The get_similar_movie function will return the most similar movie from the film_data dict. The function will take a movie name as argument.

Rankings are incorrect

I'm ranking the top ten basketball players in the NBA via points, minutes, free throws, and efficiency. However, when I go to print the rankings the rankings are incorrect. It seems to be ranking them in terms of the value of each character in the numbers but I want them to be ranked by the highest amount to the lowest amount of those previously mentioned values.
My code:
def readData(filename):
inputFile = open(filename, 'r')
inputFile.readline()
master_data_list = []
for line in inputFile:
master_data_list.append(line.split(","))
return master_data_list
def points(master_data_list):
pointList = []
for player in master_data_list[:-3]:
index = (player[1], player[2], player[6])
pointList.append(index)
return pointList
def minutes(master_data_list):
minutesList = []
for player in master_data_list[:-3]:
index = (player[1], player[2], player[5])
minutesList.append(index)
return minutesList
def freethrows(master_data_list):
freethrowsList = []
for player in master_data_list[:-3]:
index = (player[1], player[2], player[18])
freethrowsList.append(index)
return freethrowsList
def efficiency(master_data_list):
effList = []
for player in master_data_list:
formula = int(((player[6] + player[9] + player[10] + player[11] + player[12])-((player[15] - player[16]) + (player[17] - player[18]) + player[13]))/(player[4]))
index = (player[1], player[2], formula)
effList.append(formula)
return effList
def main():
master_data_list = readData("player_career.csv")
pointList = points(master_data_list)
minutesList = minutes(master_data_list)
freethrowsList = freethrows(master_data_list)
#effList = efficiency(master_data_list)
#got this code from
pointList = sorted(pointList, key = lambda x: x[2], reverse = True)
minutesList = sorted(minutesList, key = lambda x: x[2], reverse = True)
freethrowsList = sorted(freethrowsList, key = lambda x: x[2], reverse = True)
#effList = sorted(effList, key = lambda x: x[2], reverse = True)
print("Top 10 players based on total points scored.")
for line in pointList[:10]:
print(line[0], line[1]+"-"+line[2])
print()
print("Top 10 players based on total minutes.")
for line in minutesList[:10]:
print(line[0], line[1]+"-"+line[2])
print()
print("Top 10 players based on total free throws.")
for line in freethrowsList[:10]:
print(line[0], line[1]+"-"+line[2])
print()
"""
print("Top 10 players based on total efficiency.")
for line in effList[:10]:
print(line[0], line[1]+"-"+line[2])
print()
"""
The output of the code:
Top 10 players based on total points scored.
Terrell Brandon-9994
Rony Seikaly-9991
David Vaughn-998
Buddy Jeannette-997
Irv Torgoff-997
Greg Ballard-9953
Ralph Simpson-9953
John Lucas-9951
Don Kojis-9948
Phil Chenier-9931
Top 10 players based on total minutes.
Fred Hoiberg-9976
Charlie Johnson-9972
Stewart Granger-996
Gary Gregor-996
Keith Bogans-9957
Al Wood-9939
Kenny Gattison-9930
Willis Bennett-993
Jack Molinas-993
Corie Blount-9925
Top 10 players based on total free throws.
Kurt Rambis-999
Charlie Scott-998
Walt Wesley-998
Albert King-996
Lucious Harris-994
Johnny Neumann-991
Frank Johnson-990
Mardy Collins-99
Calvin Garrett-99
Bob Lackey-99
I haven't done efficiency yet so it's not included.
Here's a sample of the input file:
player stats
If someone has an idea to get them to rank correctly, thanks.
Please change this key = lambda x: x[2] into key = lambda x: int(x[2]) to compare integers, not strings.
You need to change the strings in your input into integers, like:
def points(master_data_list):
pointList = []
for player in master_data_list[:-3]:
index = (player[1], player[2], int(player[6]))
pointList.append(index)
return pointList

How can I optimise in term of time this python code

I write this code but I find it very slow and I don't know how to really improve it in term of time. data is a json object with approximately 70 000 key in it. I think the slowest part is the actors part because i'm iterating on a list (which contain at most 3 elements).
genres_number = {}
actors_number = {}
for movie in data:
for genre in data[movie]["genres"]:
if data[movie]["actors"] != None:
for actor in data[movie]["actors"]:
if actor not in actors_number.keys():
actors_number[actor] = 1
else:
actors_number[actor] = actors_number[actor] + 1
if genre not in genres_number.keys():
genres_number[genre] = 1
else:
genres_number[genre] = genres_number[genre] + 1
res = []
res.append(genres_number)
res.append(actors_number)
return res
How does this work for you
from collections import defaultdict
def get_stats(data):
genres_number = defaultdict(int)
actors_number = defaultdict(int)
for movie in data:
actors = movie.get('actors')
if actors:
for actor in actors:
actors_number[actor] += 1
genres = movie.get('genres')
for genre in genres:
genres_number[actor] += 1
res = []
res.append(dict(genres_number))
res.append(dict(actors_number))
return res

convert contents of metadata file into variables list

Hi I m wanting to convert the contents of a file (in this case a Landsat 7 metadata file) into a series of variables defined by the contents of the file using Python 2.7. The file contents looks like this:
GROUP = L1_METADATA_FILE
GROUP = METADATA_FILE_INFO
ORIGIN = "Image courtesy of the U.S. Geological Survey"
REQUEST_ID = "0101305309253_00043"
LANDSAT_SCENE_ID = "LE71460402010069SGS00"
FILE_DATE = 2013-06-02T11:19:59Z
STATION_ID = "SGS"
PROCESSING_SOFTWARE_VERSION = "LPGS_12.2.1"
DATA_CATEGORY = "NOMINAL"
END_GROUP = METADATA_FILE_INFO
GROUP = PRODUCT_METADATA
DATA_TYPE = "L1T"
ELEVATION_SOURCE = "GLS2000"
OUTPUT_FORMAT = "GEOTIFF"
EPHEMERIS_TYPE = "DEFINITIVE"
SPACECRAFT_ID = "LANDSAT_7"
SENSOR_ID = "ETM"
SENSOR_MODE = "BUMPER"
WRS_PATH = 146
WRS_ROW = 040
DATE_ACQUIRED = 2010-03-10
GROUP = IMAGE_ATTRIBUTES
CLOUD_COVER = 0.00
IMAGE_QUALITY = 9
SUN_AZIMUTH = 137.38394502
SUN_ELEVATION = 48.01114126
GROUND_CONTROL_POINTS_MODEL = 55
GEOMETRIC_RMSE_MODEL = 3.790
GEOMETRIC_RMSE_MODEL_Y = 2.776
GEOMETRIC_RMSE_MODEL_X = 2.580
END_GROUP = IMAGE_ATTRIBUTES
Example of interested variable items:
GROUP = MIN_MAX_RADIANCE
RADIANCE_MAXIMUM_BAND_1 = 293.700
RADIANCE_MINIMUM_BAND_1 = -6.200
RADIANCE_MAXIMUM_BAND_2 = 300.900
RADIANCE_MINIMUM_BAND_2 = -6.400
RADIANCE_MAXIMUM_BAND_3 = 234.400
RADIANCE_MINIMUM_BAND_3 = -5.000
RADIANCE_MAXIMUM_BAND_4 = 241.100
RADIANCE_MINIMUM_BAND_4 = -5.100
RADIANCE_MAXIMUM_BAND_5 = 47.570
RADIANCE_MINIMUM_BAND_5 = -1.000
RADIANCE_MAXIMUM_BAND_6_VCID_1 = 17.040
RADIANCE_MINIMUM_BAND_6_VCID_1 = 0.000
RADIANCE_MAXIMUM_BAND_6_VCID_2 = 12.650
RADIANCE_MINIMUM_BAND_6_VCID_2 = 3.200
RADIANCE_MAXIMUM_BAND_7 = 16.540
RADIANCE_MINIMUM_BAND_7 = -0.350
RADIANCE_MAXIMUM_BAND_8 = 243.100
RADIANCE_MINIMUM_BAND_8 = -4.700
END_GROUP = MIN_MAX_RADIANCE
I am open to other ideas as I don't need all entries as variables, just a selection. And I see some headers are listed more than once. i.e. GROUP is used multiple times. I need to be able to select certain variables (integer values) and use in formulas in other areas of code. ANY help would be appreciated (novice python coder).
I'm not sure exactly what you are looking for, but maybe something like this:
s = '''GROUP = L1_METADATA_FILE
GROUP = METADATA_FILE_INFO
ORIGIN = "Image courtesy of the U.S. Geological Survey"
REQUEST_ID = "0101305309253_00043"
LANDSAT_SCENE_ID = "LE71460402010069SGS00"
FILE_DATE = 2013-06-02T11:19:59Z
STATION_ID = "SGS"
PROCESSING_SOFTWARE_VERSION = "LPGS_12.2.1"
DATA_CATEGORY = "NOMINAL"
END_GROUP = METADATA_FILE_INFO
GROUP = PRODUCT_METADATA
DATA_TYPE = "L1T"
ELEVATION_SOURCE = "GLS2000"
OUTPUT_FORMAT = "GEOTIFF"
EPHEMERIS_TYPE = "DEFINITIVE"
SPACECRAFT_ID = "LANDSAT_7"
SENSOR_ID = "ETM"
SENSOR_MODE = "BUMPER"
WRS_PATH = 146
WRS_ROW = 040
DATE_ACQUIRED = 2010-03-10'''
output = {} #Dict
for line in s.split("\n"): #Iterates through every line in the string
l = line.split("=") #Seperate by "=" and put into a list
output[l[0].strip()] = l[1].strip() #First word is key, second word is value
print output #Output is a dictonary containing all key-value pairs in your metadata seperated by "="
print output["SENSOR_ID"] #Outputs "ETM"
==============
Edited:
f = open('metadata.txt', 'r') #open file for reading
def build_data(f): #build dictionary
output = {} #Dict
for line in f.readlines(): #Iterates through every line in the string
if "=" in line: #make sure line has data as wanted
l = line.split("=") #Seperate by "=" and put into a list
output[l[0].strip()] = l[1].strip() #First word is key, second word is value
return output #Returns a dictionary with the key, value pairs.
data = build_data(f)
print data["IMAGE_QUALITY"] #prints 9

Categories