Getting data into a map - python

I got my .dat data formatted into arrays I could use in graphs and whatnot.
I got my data from this website and it requires an account if you want to download it yourself. The data will still be provided below, however.
https://daac.ornl.gov/cgi-bin/dsviewer.pl?ds_id=1028
data in python:
import pandas as pd
df = pd.read_csv("ocean_flux_co2_2d.dat", header=None)
print(df.head())
0 1 2 3
0 -178.75 -77.0 0.000003 32128.7
1 -176.25 -77.0 0.000599 32128.7
2 -173.75 -77.0 0.001649 39113.5
3 -171.25 -77.0 0.003838 58934.0
4 -168.75 -77.0 0.007192 179959.0
I then decided to put this data into arrays that could be put into graphs and other functions.
Like so:
lat = []
lon = []
sed = []
area = []
with open('/home/srowpie/SrowFinProj/Datas/ocean_flux_tss_2d.dat') as f:
for line in f:
parts = line.split(',')
lat.append(float(parts[0]))
lon.append(float(parts[1]))
sed.append(float(parts[2]))
area.append(float(parts[3]))
lat = np.array(lat)
lon = np.array(lon)
sed = np.array(sed)
area = np.array(area)
My question now is how can I put this data into a map with data points? Column 1 is latitude, Column 2 is longitude, Column 3 is sediment flux, and Column 4 is the area covered. Or do I have to bootleg it by making a graph that takes into account the variables lat, lon, and sed?

You don't need to get the data into an array. Just apply df.values and you would have a numpy array of all the data in the dataframe.
Example -
array([[-1.78750e+02, -7.70000e+01, 3.00000e-06, 3.21287e+04],
[-1.76250e+02, -7.70000e+01, 5.99000e-04, 3.21287e+04],
[-1.73750e+02, -7.70000e+01, 1.64900e-03, 3.91135e+04],
[-1.71250e+02, -7.70000e+01, 3.83800e-03, 5.89340e+04],
[-1.68750e+02, -7.70000e+01, 7.19200e-03, 1.79959e+05]])
I'll not recommend storing individual columns as variable. Instead just set the column names for the dataframe and then use them to extract a pandas Series of the data in that column.
df.columns = ["Latitude", "Longitude", "Sediment Flux", "Area covered"]
This what the table would look like after this,
Latitude
Longitude
Sediment Flux
Area covered
0
-178.75
-77.0
3e-06
32128.7
1
-176.25
-77.0
0.000599
32128.7
2
-173.75
-77.0
0.001649
39113.5
3
-171.25
-77.0
0.003838
58934.0
4
-168.75
-77.0
0.007192
179959.0
Simply do df[column_name] to get the data in that column.
For example -> df["Latitude"]
Output -
0 -178.75
1 -176.25
2 -173.75
3 -171.25
4 -168.75
Name: Latitude, dtype: float64
Once you have done all this, you can use folium to plot the rows on real interactive maps.
import folium as fl
map = fl.Map(df.iloc[0, :2], zoom_start = 100)
for index in df.index:
row = df.loc[index, :]
fl.Marker(row[:2].values, f"{dict(row[2:])}").add_to(map)
map

Related

having a route between two gps coordinates

Good Evening,
I have gps coordinates for each trip and i'm trying to have a line between each point for each trip..
I'm using this code, but still doesn't work , if i delete groupby(id) it works but i get a line also between points from not the same tripId ..
tooltip = "Click me!"
for i in range(11):
folium.Marker(df.groupby('id')
[df['latitude'][i],df['longitude'][i]], popup=df['id'][i], tooltip=tooltip ).add_to(map)
route = folium.PolyLine(df.groupby('id')
[[df['latitude'][i],df['longitude'][i]],[df['latitude'][i+1],df['longitude'][i+1]]],
tooltip = "trip" ).add_to(map)
my dataframe looks like that :
longitude latitude id
0 5.184529 52.032471 66168
1 5.184513 52.032047 66168
2 5.184468 52.031559 66168
7 5.183908 52.027328 66168
8 5.175724 52.084732 89751
9 5.175513 52.084743 89751
10 5.174866 52.084713 89751
I suggest separating adding the polylines and the markers to the map. Markers can be added individually, the polylines as lists of geolocations. Since the latter needs to be clustered by id, it makes sense to add them per group, after the groupby:
import pandas as pd
import folium
import io
data = ''' longitude latitude id
0 5.184529 52.032471 66168
1 5.184513 52.032047 66168
2 5.184468 52.031559 66168
7 5.183908 52.027328 66168
8 5.175724 52.084732 89751
9 5.175513 52.084743 89751
10 5.174866 52.084713 89751'''
df = pd.read_csv(io.StringIO(data), sep='\s\s+')
tooltip = "Click me!"
m = folium.Map(location=[52.031559, 5.184468],
zoom_start=15)
for index, row in df.iterrows():
folium.Marker([row['latitude'], row['longitude']],
popup=row['id'],
tooltip=tooltip
).add_to(m)
for index, row in df.groupby('id',as_index=False)[['latitude','longitude']].agg(list).iterrows():
loc = list(zip(row.latitude, row.longitude))
folium.PolyLine(loc, tooltip = "trip").add_to(m)

fetching a csv that have a vector in one column c(,)

I was using R to get my data perpared, however I find myself forced to use python instead.
The csv files have been stored as sf dataframe, where a column geometry stores both long and lat.
In my files, I have the following structure:
a,geometry,b
50,c(-95.11, 10.19),32.24
60,,c(-95.12, 10.27),22.79
70,c(-95.13, 10.28),14.91
80,c(-95.14, 10.33),18.35
90,c(-95.15, 10.5),28.35
99,c(-95.16, 10.7),48.91
The aim here is to read the file while knowing that c(-95.11, 10.19) are 2 values lon and lat so they can be storred in two different columns. However having the separator inside the value which is also not a string makes this really hard to do.
The expected output should be :
a,long,lat,b
50,-95.11, 10.19,32.24
60,,-95.12, 10.27,22.79
70,-95.13, 10.28,14.91
80,-95.14, 10.33,18.35
90,-95.15, 10.5,28.35
99,-95.16, 10.7,48.91
Does this work (input file: data.csv; output file: data_out.csv):
import csv
with open('data.csv', 'r') as fin, open('data_out.csv', 'w') as fout:
reader, writer = csv.reader(fin), csv.writer(fout)
next(reader)
writer.writerow(['a', 'long', 'lat', 'b'])
for row in reader:
row[1] = row[1][2:]
row[2] = row[2][1:-1]
writer.writerow(row)
In your sample output is a blank after the second column: Is this intended? Also, your sample input has in line two a double , after the first column?
If you were looking for a R based solution you may consider extracting the coordinates from {sf} based geometry column into regular columns, and saving accordingly.
Consider this example, built on three semi-random North Carolina cities:
library(sf)
library(dplyr)
cities <- data.frame(name = c("Raleigh", "Greensboro", "Wilmington"),
x = c(-78.633333, -79.819444, -77.912222),
y = c(35.766667, 36.08, 34.223333)) %>%
st_as_sf(coords = c("x", "y"), crs = 4326)
cities # a class sf data.frame
Simple feature collection with 3 features and 1 field
geometry type: POINT
dimension: XY
bbox: xmin: -79.81944 ymin: 34.22333 xmax: -77.91222 ymax: 36.08
geographic CRS: WGS 84
name geometry
1 Raleigh POINT (-78.63333 35.76667)
2 Greensboro POINT (-79.81944 36.08)
3 Wilmington POINT (-77.91222 34.22333)
mod_cit <- cities %>%
mutate(long = st_coordinates(.)[,1],
lat = st_coordinates(.)[,2]) %>%
st_drop_geometry()
mod_cit # a regular data.frame
name long lat
1 Raleigh -78.63333 35.76667
2 Greensboro -79.81944 36.08000
3 Wilmington -77.91222 34.22333

Pandas - Split column entry (each other seperator)

I have a pandas data frame that looks something like this
| id | name | latlon |
0 sat -28,14 | -23, 12 | -21, 13...
the latlon column entry contains multiple latitude/longitude entries, seperated with the | symbol, I need to split them into a list as follows: lat = [-28,-23,-21] lon = [14,12,13]
running the following command will create a list of all the values
sat_df["latlon"]= sat_df["latlon"].str.split("|", expand=False)
example:indexnumber [-58.562242560404705,52.82662430990185, -61.300361184039964,64.0645716165538, -62.8683906074927,76.96557954998904, -63.078154849236505,90.49660509514713, -61.95530287454162,103.39930010176977, -59.727998547544765,114.629246065411, -56.63116878989326,124.07501384844198, -52.9408690779807,131.75498199669985, -48.85803704806645,137.9821558270659, -44.56621244973711,143.03546934613863, -40.08092215592037,147.27807367743728, -35.5075351924213,150.86679792543603,]
how can I continue to split the data, so each other entry is assgined to the lat/lon list respectivley, for the entire dataframe. Alternativley, is there some way to create two columns (lat/lon) which both hold a list object with all the values?
EDIT:
import pandas as pd
sat_df = pd.DataFrame({'卫星编号': {0: 38858, 1: 5, 2: 16}, 'path': {0: '-2023240,1636954,-1409847|-2120945,1594435,-1311586|-2213791,1547970,-1209918|', 1: '8847,-974294,-168045|69303,-972089,-207786|129332,-963859,-246237|189050,-949637,-283483|', 2: '283880,751564,538726|214030,782804,550729|142133,808810,558964|69271,829348,563411|'}, 'latlon': {0: '-28.566504816706743,-58.42623323318429|-26.424915546197877,-58.03051668423269|-24.24957760771616,-57.709052434729294|-22.049419348341488,-57.45429550739338|-19.82765114196696,-57.258197633964414|-17.58719794818057,-57.113255687570714|-15.33074070109176,-57.01245109909582|-13.060755383916138,-56.949188922655416|-10.779548173615462,-56.91723753411087|-8.48928513939462,-56.910669632641685|-6.192021225701933,-56.92380598464241|-3.8897270110140494,-56.951159278680606|-1.5843114029280712,-56.987381318629815|0.7223533959819478,-57.02721062232328|3.028411197431552,-57.06542107180802|5.331999106238248,-57.09677071391785|7.631224662503422,-57.115951252231326|9.924144733525859,-57.11753523668981|12.20873984934678,-57.09592379302077|14.482890506579363,-57.045292032888945|16.744349099342163,-56.95953284633186|18.99070929829218,-56.83219872719919|', 1: '-9.826016080133869,71.12640824438319|-12.077961267269185,74.17040194928683|-14.251942328865088,77.22102880126546|-16.362232784638383,80.31943171515469|-18.372371674164317,83.43158582640798|-20.311489634835258,86.62273098947678|-22.14461262803909,89.85609377674561|-23.896490600856566,93.19765633031801|-25.53339979617313,96.60696767976263|-27.063070616439813,100.12254137641649|-28.488648081761962,103.78528610926675|-29.778331008010497,107.54645547637602|-30.942622037767002,111.47495996053523|-31.95152016226762,115.51397654947516|-32.80866797590735,119.73211812295206|-33.486858278098815,124.06227007574186|-33.98257678066123,128.57116785317814|-34.27304876808886,133.17990028392123|-34.34804732039687,137.91355482600457|-34.19053759979979,142.79776551711302|-33.788689805715364,147.73758823197466|-33.12248489727676,152.7937677542324|', 2: '34.00069374375586,-130.03583418452314|34.3070000099521,-125.16691893340256|34.37547230320849,-120.37930544344802|34.219644836708575,-115.72548686095767|33.8599777210809,-111.25048787484094|33.307236654159695,-106.89130089454063|32.579218893589676,-102.68672977394559|31.69071108398145,-98.63657044455137|30.663892680279847,-94.76720076317056|29.49498481622457,-91.01231662520239|28.20247456939903,-87.39472628213446|26.796048279088225,-83.90476041381801|25.29620394685256,-80.5572008057606|23.686627724590036,-77.28791855670698|21.984668849769005,-74.1108962902788|20.209508481020038,-71.0367205896831|18.337433788359615,-68.00383542959851|16.385207987194672,-65.02251732177939|14.355346635752394,-62.078279068092414|12.266387624465171,-59.17870114389838|10.087160866120724,-56.262880710180255|7.8348695447113235,-53.336971029542006|'}})
#splits latlon data into a list
sat_df.dropna(inplace=True)
sat_df["latlon"]= sat_df["latlon"].str.split("|", expand=False)
sat_df
#need to write each entries latlon list as two lists (alternating lat and lon)
lat = []
lon = []
#for sat_df["latlon"]:
lets go a step back from your str.strip and make use of explode which was added in pandas 0.25
then merge it back based on the index.
df = sat_df['latlon'].str.split('|').explode().str.split(',',expand=True)
new_df = pd.merge(sat_df.drop('latlon',axis=1),
df,left_index=True,
right_index=True).rename(columns={0 : 'Lat', 1 : 'Lon'})
print(new_df.drop('path',axis=1))
卫星编号 Lat Lon
0 38858 -28.566504816706743 -58.42623323318429
0 38858 -26.424915546197877 -58.03051668423269
0 38858 -24.24957760771616 -57.709052434729294
0 38858 -22.049419348341488 -57.45429550739338
0 38858 -19.82765114196696 -57.258197633964414
.. ... ... ...
2 16 14.355346635752394 -62.078279068092414
2 16 12.266387624465171 -59.17870114389838
2 16 10.087160866120724 -56.262880710180255
2 16 7.8348695447113235 -53.336971029542006
2 16 None
For this purpose we are using pandas library.
Initially I have created a dataframe as you have mentioned.
Code:
import pandas as pd
latlon = [-58.562242560404705,52.82662430990185, -61.300361184039964,64.0645716165538, -62.8683906074927,76.96557954998904, -63.078154849236505,90.49660509514713, -61.95530287454162,103.39930010176977, -59.727998547544765,114.629246065411, -56.63116878989326,124.07501384844198, -52.9408690779807,131.75498199669985, -48.85803704806645,137.9821558270659, -44.56621244973711,143.03546934613863, -40.08092215592037,147.27807367743728, -35.5075351924213,150.86679792543603,]
# print(latlon)
data = pd.DataFrame({'id':[0],'name':['sat'],'latlon':[latlon]})
print(data)
Output:
id name latlon
0 0 sat [-58.562242560404705, 52.82662430990185, -61.3...
Now I've converted the latlon to string in order to iterate because if you try to iterate float value you may get error. Then we are passing the lattitude and longitude values to corresponding columns of the dataframe.
This code will work even if you have more any number of records or rows in your dataframe.
Code:
#splittint latlon and making adding the values to lat and lon columns
lats = []
lons = []
for i in range(len(data)):
lat_lon = [str(x) for x in (data['latlon'].tolist()[i])]
lat = []
lon = []
for i in range(len(lat_lon)):
if i%2==0:
lat.append(float(lat_lon[i]))
else:
lon.append(float(lat_lon[i]))
lats.append(lat)
lons.append(lon)
data = data.drop('latlon',axis=1) #dropping latlon column
data.insert(2,'lat',lats) #adding lat column
data.insert(3,'lon',lons) #adding lon column
# print(data)
data #displaying dataframe
Output:
id name lat lon
0 0 sat [-58.562242560404705, -61.300361184039964, -62... [52.82662430990185, 64.0645716165538, 76.96557...
I hope it would be helpful.

How do I write a for loop within a function to pickup values within a csv?

I have a file called sampleweather100 which has Latitudes and longtidudes of addresses. If i manually type in these lats and longs under the location list function, I get the output I desire. However, I want to write a function where it pulls out the output for all rows of my csv without me manually entering it:
import pandas as pd
my_cities = pd.read_csv('sampleweather100.csv')
from wwo_hist import retrieve_hist_data
#lat = -31.967819
#lng = 115.87718
#location_list = ["-31.967819,115.87718"]
frequency=24
start_date = '11-JAN-2018'
end_date = '11-JAN-2019'
api_key = 'MyKey'
location_list = ["('sampleweather100.csv')['Lat'],('sampleweather100.csv')['Long']"]
hist_weather_data = retrieve_hist_data(api_key,
location_list,
start_date,
end_date,
frequency,
location_label = False,
export_csv = True,
store_df = True)
My function location_list = ["('sampleweather100.csv')['Lat'],('sampleweather100.csv')['Long']"] does not work. Is there a better way or a forloop that will fetch each rows lat and long into that location_list function:
Reprex of dataset:
my_cities
Out[89]:
City Lat Long
0 Lancaster 39.754545 -82.636371
1 Canton 40.851178 -81.470345
2 Edison 40.539561 -74.336307
3 East Walpole 42.160667 -71.213680
4 Dayton 39.270486 -119.577078
5 Fort Wainwright 64.825343 -147.673877
6 Crystal 45.056106 -93.350020
7 Medford 42.338916 -122.839771
8 Spring Valley 41.103816 -74.045399
9 Hillsdale 41.000879 -74.026089
10 Newyork 40.808582 -73.951553
Your way of building the list just does not make sense. You are using the filename of the csv, which is just a string and holds no reference to the file itself or the dataframe you have created from it.
Since you buildt a dataframe called my_cities from your csv using pandas, you need to extract your list of pairs from the dataframe my_cities:
location_list = [','.join([str(lat), str(lon)]) for lat, lon in zip(my_cities['Lat'], my_cities['Long'])]
This is the list you get with the above line using your sample dataframe:
['39.754545,-82.636371', '40.851178000000004,-81.470345',
'40.539561,-74.33630699999999', '42.160667,-71.21368000000001',
'39.270486,-119.577078', '64.825343,-147.673877', '45.056106,-93.35002',
'42.338916,-122.839771', '41.103815999999995,-74.045399',
'41.000879,-74.026089', '40.808582,-73.951553']
You could use one of these to covert the dataframe into a list of comma-separated pairs:
location_list = [
'{},{}'.format(lat, lon)
for i, (lat, lon) in my_cities.iterrows()
]
or
location_list = [
'{},{}'.format(lat, lon)
for lat, lon in my_cities.values
]

Big data visualization for multiple sampled data points from a large log

I have a log file which I need to plot in python with different data points as a multi line plot with a line for each unique point , the problem is that in some samples some points would be missing and new points would be added in another, as shown is an example with each line denoting a sample of n points where n is variable:
2015-06-20 16:42:48,135 current stats=[ ('keypassed', 13), ('toy', 2), ('ball', 2),('mouse', 1) ...]
2015-06-21 16:42:48,135 current stats=[ ('keypassed', 20, ('toy', 5), ('ball', 7), ('cod', 1), ('fish', 1) ... ]
in the above 1 st sample 'mouse ' is present but absent in the second line with new data points in each sample added like 'cod','fish'
so how can this be done in python in the quickest and cleanest way? are there any existing python utilities which can help to plot this timed log file? Also being a log file the samples are thousands in numbers so the visualization should be able to properly display it.
Interested to apply multivariate hexagonal binning to this and different color hexagoan for each unique column "ball,mouse ... etc". scikit offers hexagoanal binning but cant figure out how to render different colors for each hexagon based on the unique data point. Any other visualization technique would also help in this.
Getting the data into pandas:
import pandas as pd
df = pd.DataFrame(columns = ['timestamp','name','value'])
with open(logfilepath) as f:
for line in f.readlines():
timestamp = line.split(',')[0]
#the data part of each line can be evaluated directly as a Python list
data = eval(line.split('=')[1])
#convert the input data from wide format to long format
for name, value in data:
df = df.append({'timestamp':timestamp, 'name':name, 'value':value},
ignore_index = True)
#convert from long format back to wide format, and fill null values with 0
df2 = df.pivot_table(index = 'timestamp', columns = 'name')
df2 = df2.fillna(0)
df2
Out[142]:
value
name ball cod fish keypassed mouse toy
timestamp
2015-06-20 16:42:48 2 0 0 13 1 2
2015-06-21 16:42:48 7 1 1 20 0 5
Plot the data:
import matplotlib.pylab as plt
df2.value.plot()
plt.show()

Categories