processing independent values from iterating through a list of dictionaries

processing independent values from iterating through a list of dictionaries - python

Faultysats=[{'G11': '16 01 13 09 43 50.0000000'},
{'G11': '16 01 13 09 43 51.0000000'},
{'G03': '16 01 13 09 43 52.0000000'}]
SATS=['G01', 'G03', 'G04', 'G08', 'G11', 'G19', 'G28', 'G32']
EPOCH='16 01 13 09 43 51.0000000'
I have a these lists- faultysats, a list of dictionaries containing varying satellite and epoch times; SATS, a list containing sats; and EPOCH, a time value.
If a satellite from faultysats (e.g'G11') appears in SATS AND its corresponding epoch (eg. '16 01 13 09 43 50.0000000') from faultysats appears in EPOCH, I want to know which index the satellite is at in the SATS list.
Hope that makes sense, im struggling because i dont know how to ascertain varying values in a list of dictionaries. Is there a certain operator that deals with extracting data from a list of sats?

To get a list of the indexes you could use:
Faultysats=[{'G11': '16 01 13 09 43 50.0000000'},
{'G11': '16 01 13 09 43 51.0000000'},
{'G03': '16 01 13 09 43 52.0000000'}]
SATS=['G01', 'G03', 'G04', 'G08', 'G11', 'G19', 'G28', 'G32']
EPOCH='16 01 13 09 43 51.0000000'
indexes = []
for i in range(0, len(Faultysats)):
for key in Faultysats[i]:
if (key in SATS) and (Faultysats[i][key] == EPOCH):
indexes.append(i)
print(indexes)

How about this:
Get first the list of dictionaries's keys whose corresponding value is equal to EPOCH then get the index of that key from SATS:
>>> Faultysats=[{'G11': '16 01 13 09 43 50.0000000'},
{'G11': '16 01 13 09 43 51.0000000'},
{'G03': '16 01 13 09 43 52.0000000'}]
>>> SATS=['G01', 'G03', 'G04', 'G08', 'G11', 'G19', 'G28', 'G32']
>>> EPOCH='16 01 13 09 43 51.0000000'
>>> f = [k for d in Faultysats for k,v in d.items() if EPOCH == v]
>>> indx = [SATS.index(x) for x in f]
>>>
>>> indx
[4]

First, let's get a list of all satellites from Faultysats that are in SATS and have the EPOCH timestamp.
sats = [sat for pair in Faultysats for sat, epoch in pair.iteritems()
if sat in SATS and epoch == EPOCH]
>>> sats
['G11']
Now you can use a dictionary comprehension to provide the index location of each satellite in SATS (assuming there are no duplicates).
>>> {s: SATS.index(s) for s in sats}
{'G11': 4}

Related

Pandas : A possible one-liner code that iterates through a list within a cell and extracting combinations from within that list. Using df.apply()

Instead of confusing by explanation, I'll let the code explain what I'm trying to achieve
I'm comparing a combination in one dataframe with another.
The 2 dataframes has the following:
an address
the string of the articles from which it was extracted (in a list format)
I WANT TO FIND : address-article combination within one file, say reg, that's not present in the other look. The df names refer to the method the addresses were extracted from the articles.
Note: The address when written 'AZ 08' and '08 AZ' should be treated the same.
reg = pd.DataFrame({'Address': {0: 'AZ 08',1: '04 CA',2: '10 FL',3: 'NY 30'},
'Article': {0: '[\'Location AZ 08 here\', \'Went to 08 AZ\']',
1: '[\'Place 04 CA here\', \'Going to 04 CA\', \'Where is 04 CA\']',
2: '[\'This is 10 FL \', \'Coming from FL 10\']',
3: '[\'Somewhere around NY 30\']'}})
look = pd.DataFrame({'Address': {0: 'AZ 08',1: '04 CA',2: 'NY 30' },
'Article': {0: '[\'Location AZ 08 here\']',
1: '[\'Place 04 CA here\', \'Going to 04 CA\', \'Where is 04 CA\']',
2: '[\'Somewhere around NY 30\', \'Almost at 30 NY\']'}})
What i did (able to) find is, the records in which there is a mismatch. But unable to get a address - location level info.
My method shown below.
def make_set_expanded(string,review):
rev_l = ast.literal_eval(review)
s = set(str(string).lower().split())
s.update(rev_l)
return s
reg_list_expand = reg.apply(lambda x: make_set_expanded(x['Address'], x['Article']), axis=1).to_list()
look_list_expand = look.apply(lambda x: make_set_expanded(x['Address'], x['Article']), axis=1).to_list()
reg_diff = reg[reg.apply(lambda x: 'Yes' if make_set_expanded(x['Address'], x['Article']) in look_list_expand else 'No', axis=1) == 'No']
look_diff = look[look.apply(lambda x: 'Yes' if make_set_expanded(x['Address'], x['Article']) in reg_list_expand else 'No', axis=1) == 'No']
The functions, in overall :
treates an address 'AZ 08' and '08 AZ' as the same
shows missing addresses.
shows addresses which came from a diferent article
But instead of showing the whole list as is (i.e including the ones which already has a match), I would like to show only the particular combination thats missing.
For eg in : in reg_diff, instead of showing the whole set again, i'd like to see only the address-article combination :
'AZ 08': 'Went to 08 AZ' in the row.

IIUC, try:
convert your "Article" column from string to list
explode to get each article in a separate row.
outer merge with indicator=True to identify which DataFrame each row comes from
filter the merged dataframe to get the required output.
reg["Article"] = reg["Article"].str[1:-1].str.split(", ")
reg = reg.explode("Article")
look["Article"] = look["Article"].str[1:-1].str.split(", ")
look = look.explode("Article")
merged = reg.merge(look, how="outer", indicator=True)
reg_diff = merged[merged["_merge"].eq("left_only")].drop("_merge", axis=1)
look_diff = merged[merged["_merge"].eq("right_only")].drop("_merge", axis=1)
>>> reg_diff
Address Article
1 AZ 08 'Went to 08 AZ'
5 10 FL 'This is 10 FL '
6 10 FL 'Coming from FL 10'
>>> look_diff
Address Article
8 NY 30 'Almost at 30 NY'

I'm not fully sure what your logic is looking to do, so I'll start with some example methods that may be useful:
Example of how to use .apply(eval) to format the text as lists. And an example of using .explode() to make those lists into rows.
def format_df(df):
df = df.copy()
df.Article = df.Article.apply(eval)
df = df.explode("Article")
return df.reset_index(drop=True)
reg, look = [format_df(x) for x in [reg, look]]
print(reg)
print(look)
Output:
Address Article
0 AZ 08 Location AZ 08 here
1 AZ 08 Went to 08 AZ
2 04 CA Place 04 CA here
3 04 CA Going to 04 CA
4 04 CA Where is 04 CA
5 10 FL This is 10 FL
6 10 FL Coming from FL 10
7 NY 30 Somewhere around NY 30
Address Article
0 AZ 08 Location AZ 08 here
1 04 CA Place 04 CA here
2 04 CA Going to 04 CA
3 04 CA Where is 04 CA
4 NY 30 Somewhere around NY 30
5 NY 30 Almost at 30 NY
Example of packing rows back into lists:
reg = reg.groupby('Address', as_index=False).agg(list)
look = look.groupby('Address', as_index=False).agg(list)
print(reg)
print(look)
Output:
Address Article
0 04 CA [Place 04 CA here, Going to 04 CA, Where is 04...
1 10 FL [This is 10 FL , Coming from FL 10]
2 AZ 08 [Location AZ 08 here, Went to 08 AZ]
3 NY 30 [Somewhere around NY 30]
Address Article
0 04 CA [Place 04 CA here, Going to 04 CA, Where is 04...
1 AZ 08 [Location AZ 08 here]
2 NY 30 [Somewhere around NY 30, Almost at 30 NY]

Check for blanks at specified positions in a string

I have the following problem, which I have been able to solve in a very long way and I would like to know if there is any other way to solve it. I have the following string structure:
text = 01 ARA 22 - 02 GAG 23
But due to processing sometimes the spaces are not added properly and it may look like this:
text = 04 GOR23- 02 OER 23
text = 04 ORO 21-02 RRO 24
text = 04 DRE25- 12 RIS21
When they should look as follows:
text = 04 GOR 23 - 02 OER 23
text = 04 ORO 21 - 02 RRO 24
text = 04 DRE 25 - 12 RIS 21
To add the space in those specific positions, basically I check if in that position of the string the space exists, if it does not exist I add it.
Is there another way in python to do it more efficiently?
I appreciate any advice.

You can use a regex to capture each of the components in the text, and then replace any missing spaces with a space:
import re
regex = re.compile(r'(\d{2})\s*([A-Z]{3})\s*(\d{2})\s*-\s*(\d{2})\s*([A-Z]{3})\s*(\d{2})')
text = ['04 GOR23- 02 OER 23',
'04 ORO 21-02 RRO 24',
'04 DRE25- 12 RIS21']
[regex.sub(r'\1 \2 \3 - \4 \5 \6', t) for t in text]
Output:
['04 GOR 23 - 02 OER 23',
'04 ORO 21 - 02 RRO 24',
'04 DRE 25 - 12 RIS 21']

Here is another way to do so:
data = '04 GOR23- 02 OER 23'
new_data = "".join(char if not i in [2, 5, 7, 8, 10, 13] else f" {char}" for i, char in enumerate(data.replace(" ", "")))

Splitting multiple, spaced-list values in Python [duplicate]

This question already has answers here:
Apply function to each element of a list
(4 answers)
How do I make a flat list out of a list of lists?
(34 answers)
Closed 1 year ago.
I have a very simple list where it looks like this:
['20 32 35 47 64', '15 17 25 32 53', '07 10 12 61 65', '08 14 31 58 68', '01 10 44 47 56']
What I would like to do is to split the values within each list where the values are displayed as follows:
[20,32,35,47,64,15,17,25,32,53,07,10,12,61,65,..]
myvallist = myvalues.split(" ")
print (myvallist)
For some reason, when I attempt to use the .split(), pyCharm is throwing an error.
Traceback (most recent call last):
File "C:\Users\VFARETR.CENTRAL\Desktop\pyCharm\megaTest", line 25, in <module>
myvallist = myvalues.split(" ")
AttributeError: 'list' object has no attribute 'split'
Any help would be great!

You're trying to split the list, not the string. Try this:
input_list = ['20 32 35 47 64', '15 17 25 32 53', '07 10 12 61 65', '08 14 31 58 68', '01 10 44 47 56']
complete_list = []
for sub_list in input_list:
current_items = sub_list.split()
complete_list.extend([i for i in current_items])

It says so because your list consists of strings. You have to acces the string inside the list before you split the string.
mylist = ['20 32 35 47 64', '15 17 25 32 53', '07 10 12 61 65',
'08 14 31 58 68', '01 10 44 47 56']
myvallist = []
for item in mylist:
myvallist += item.split(' ') # adds the numbers of the current string to myvallist
print(myvallist)

I don't know if you want the entries to be integers or strings but here's a solution with them as strings.
my_list = ['20 32 35 47 64', '15 17 25 32 53', '07 10 12 61 65', '08 14 31 58 68', '01 10 44 47 56']
my_new_list = [item for sublist in [item.split(" ") for item in my_list] for item in sublist]
The inner list comprehension splits each string by spaces and then the outer list comprehension then flattens the array produced.

Regex - Skip subsequent matches (Python)

I am trying to delimit the following into a table, but am running into issues with the name having 2 spaces in it or else "[\s]{2,}" would work. I also can't ignore whitespace between letters since the 1st column ends with a letter and the 2nd column starts with a letter.
I would like to skip any whitespace in between letters after the 1st occurrence.
String:
> TESTID DR5 777777 0 50000 TEST NAME 23.40 600000.00 1000000 20 5 09 05 18 09 07 18 3876.00
TESTID
DR5 777777 0
50000
TEST NAME
23.40
600000.00
1000000 20 5
09 05 18
09 07 18
3876.00

I will try to solve your stated problem (vs the regex thing), because I don't fully understand the regex question.
If I were going to make that string into a list, I would do it like this:
my_str = "TESTID DR5 777777 0 50000 TEST NAME 23.40 600000.00 1000000 20 5 09 05 18 09 07 18 3876.00"
my_list = [section for section in my_str.split(" ") if section != ""]
This uses list comprehension to filter out the blank strings from the split.
You can also use a regular expression as the separator.
import re
my_str = "TESTID DR5 777777 0 50000 TEST NAME 23.40 600000.00 1000000 20 5 09 05 18 09 07 18 3876.00"
my_list = re.split(r'\s{2, }', my_str)

How to read Deep-ocean Assessment and Reporting of Tsunamis (DART®) data in python

I am trying to plot water column height with python from DART data,
http://www.ndbc.noaa.gov/data/dart_deployment_realtime/23228.dart
import pandas as pd
link = "http://www.ndbc.noaa.gov/data/dart_deployment_realtime/23228.dart"
data = pd.read_table(link)
but data have only one column and I cant access seprated data
Site: 23228
0 #Paroscientific Serial: W23228
1 #YY MM DD hh mm ss T HEIGHT
2 #yr mo dy hr mn s - m
3 2014 08 08 06 00 00 1 2609.494
4 2014 08 08 05 45 00 1 2609.550
5 2014 08 08 05 30 00 1 2609.605
6 2014 08 08 05 15 00 1 2609.658
7 2014 08 08 05 00 00 1 2609.703
8 2014 08 08 04 45 00 1 2609.741
9 2014 08 08 04 30 00 1 2609.769
10 2014 08 08 04 15 00 1 2609.787
11 2014 08 08 04 00 00 1 2609.799
12 2014 08 08 03 45 00 1 2609.802
for example I just want HEIGHT value as numpy array, I dont know have to access this specific column

With pure Python (no NumPy) I would use the csv module:
import urllib2
import csv
u = urllib2.urlopen('http://www.ndbc.noaa.gov/data/dart_deployment_realtime/23228.dart')
r = csv.reader(r, delimiter=' ')
# skip the headers
for _ in range(3):
next(r, None)
Now r contains an iterable which gives one row (list of 8 items) at a time for whatever you need. Of course, if you need a list of lists, you may just do list(r).
However, as you are handling rather a large amount of data, you may probably want to use NumPy. In that case:
import urllib2
import numpy as np
u = urllib2.urlopen('http://www.ndbc.noaa.gov/data/dart_deployment_realtime/23228.dart')
arr = np.loadtxt(u, skiprows=3)
This gives you an array of 92551 x 8 values.
Accessing the heights as a NumPy array is then simple:
arr[:,7]
Pandas is another possibility, as you correctly thought. It is just a matter of a few parameters...
import urllib2
import pandas as pd
link = 'http://www.ndbc.noaa.gov/data/dart_deployment_realtime/23228.dart'
df = pd.read_table(link, delimiter=r'\s+', skiprows=[1,3], header=1)
Now you have a nice DataFrame with df["HEIGHT"] as the height. (The column names are taken from row 2 of the file.)
And for the plotting...
df["HEIGHT"].plot()
creates
(Then I guess you will ask how to get the proper date on the X axis. I think that is worth a completely new question...)

Perhaps you can modify the following:
import urllib2, numpy
response = urllib2.urlopen('http://www.ndbc.noaa.gov/data/dart_deployment_realtime/23228.dart')
all_lines = response.read().splitlines()
lines_of_interest = all_lines[4:len(all_lines)]
heights = numpy.zeros(len(lines_of_interest), dtype=float)
for idx, line in enumerate(lines_of_interest):
heights[idx] = float(line.split()[7])
Then:
>>> heights.shape
(92551,)
>>> heights
array([ 2609.27 , 2609.213, 2609.153, ..., 2611.157, 2611.084,
2611.008])
Etc.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

processing independent values from iterating through a list of dictionaries - python

Related

Pandas : A possible one-liner code that iterates through a list within a cell and extracting combinations from within that list. Using df.apply()

Check for blanks at specified positions in a string

Splitting multiple, spaced-list values in Python [duplicate]

Regex - Skip subsequent matches (Python)

How to read Deep-ocean Assessment and Reporting of Tsunamis (DART®) data in python

Categories

Resources