Python - take out the data inside cell of dataframe to another cells - python

This is the data in single cell of dataframe with 14 columns. Cell is the element of column. There are 45k+ this kind of cells, to do it manually is a hell.
one cell data
I'd like to do with this cell 3 things:
move text part with address, state, zip - to another column;
delete the hooks () of cell;
separate for 2 columns longitude and latitude.
How it's possible to do?

Here's a simple, working example with 2 data points:
text1 = """30881 EKLUTNA LAKE RD
CHUGIAK, AK 99567
(61.4478, -149.3136)"""
text2 = """30882 FAKE STR
CHUGIAK, AK 98817
(43.4478, -119.3136)"""
d = {'col1': [text1, text2]}
df = pd.DataFrame(data=d)
def fix(row):
#We split the text by newline
address, cp, latlong = row.col1.split('\n')
#We get the latitude and longitude by splitting by a comma
latlong_vec = latlong[1:-1].split(',')
#This part isn't really necessary but we create the variables for claity
lat = float(latlong_vec[0])
long = float(latlong_vec[1])
return pd.Series([address + ". " + cp, lat, long])
df[['full address', 'lat', 'long']] = df.apply(fix, axis = 1)
Output of the 3 new columns:
df['full address']
0 30881 EKLUTNA LAKE RD. CHUGIAK, AK 99567
1 30882 FAKE STR. CHUGIAK, AK 98817
df['lat']
0 61.4478
1 43.4478
Name: lat, dtype: float64
df['long']
0 -149.3136
1 -119.3136
Name: long, dtype: float64
Name: full address, dtype: object

Related

Getting data into a map

I got my .dat data formatted into arrays I could use in graphs and whatnot.
I got my data from this website and it requires an account if you want to download it yourself. The data will still be provided below, however.
https://daac.ornl.gov/cgi-bin/dsviewer.pl?ds_id=1028
data in python:
import pandas as pd
df = pd.read_csv("ocean_flux_co2_2d.dat", header=None)
print(df.head())
0 1 2 3
0 -178.75 -77.0 0.000003 32128.7
1 -176.25 -77.0 0.000599 32128.7
2 -173.75 -77.0 0.001649 39113.5
3 -171.25 -77.0 0.003838 58934.0
4 -168.75 -77.0 0.007192 179959.0
I then decided to put this data into arrays that could be put into graphs and other functions.
Like so:
lat = []
lon = []
sed = []
area = []
with open('/home/srowpie/SrowFinProj/Datas/ocean_flux_tss_2d.dat') as f:
for line in f:
parts = line.split(',')
lat.append(float(parts[0]))
lon.append(float(parts[1]))
sed.append(float(parts[2]))
area.append(float(parts[3]))
lat = np.array(lat)
lon = np.array(lon)
sed = np.array(sed)
area = np.array(area)
My question now is how can I put this data into a map with data points? Column 1 is latitude, Column 2 is longitude, Column 3 is sediment flux, and Column 4 is the area covered. Or do I have to bootleg it by making a graph that takes into account the variables lat, lon, and sed?
You don't need to get the data into an array. Just apply df.values and you would have a numpy array of all the data in the dataframe.
Example -
array([[-1.78750e+02, -7.70000e+01, 3.00000e-06, 3.21287e+04],
[-1.76250e+02, -7.70000e+01, 5.99000e-04, 3.21287e+04],
[-1.73750e+02, -7.70000e+01, 1.64900e-03, 3.91135e+04],
[-1.71250e+02, -7.70000e+01, 3.83800e-03, 5.89340e+04],
[-1.68750e+02, -7.70000e+01, 7.19200e-03, 1.79959e+05]])
I'll not recommend storing individual columns as variable. Instead just set the column names for the dataframe and then use them to extract a pandas Series of the data in that column.
df.columns = ["Latitude", "Longitude", "Sediment Flux", "Area covered"]
This what the table would look like after this,
Latitude
Longitude
Sediment Flux
Area covered
0
-178.75
-77.0
3e-06
32128.7
1
-176.25
-77.0
0.000599
32128.7
2
-173.75
-77.0
0.001649
39113.5
3
-171.25
-77.0
0.003838
58934.0
4
-168.75
-77.0
0.007192
179959.0
Simply do df[column_name] to get the data in that column.
For example -> df["Latitude"]
Output -
0 -178.75
1 -176.25
2 -173.75
3 -171.25
4 -168.75
Name: Latitude, dtype: float64
Once you have done all this, you can use folium to plot the rows on real interactive maps.
import folium as fl
map = fl.Map(df.iloc[0, :2], zoom_start = 100)
for index in df.index:
row = df.loc[index, :]
fl.Marker(row[:2].values, f"{dict(row[2:])}").add_to(map)
map

fetching a csv that have a vector in one column c(,)

I was using R to get my data perpared, however I find myself forced to use python instead.
The csv files have been stored as sf dataframe, where a column geometry stores both long and lat.
In my files, I have the following structure:
a,geometry,b
50,c(-95.11, 10.19),32.24
60,,c(-95.12, 10.27),22.79
70,c(-95.13, 10.28),14.91
80,c(-95.14, 10.33),18.35
90,c(-95.15, 10.5),28.35
99,c(-95.16, 10.7),48.91
The aim here is to read the file while knowing that c(-95.11, 10.19) are 2 values lon and lat so they can be storred in two different columns. However having the separator inside the value which is also not a string makes this really hard to do.
The expected output should be :
a,long,lat,b
50,-95.11, 10.19,32.24
60,,-95.12, 10.27,22.79
70,-95.13, 10.28,14.91
80,-95.14, 10.33,18.35
90,-95.15, 10.5,28.35
99,-95.16, 10.7,48.91
Does this work (input file: data.csv; output file: data_out.csv):
import csv
with open('data.csv', 'r') as fin, open('data_out.csv', 'w') as fout:
reader, writer = csv.reader(fin), csv.writer(fout)
next(reader)
writer.writerow(['a', 'long', 'lat', 'b'])
for row in reader:
row[1] = row[1][2:]
row[2] = row[2][1:-1]
writer.writerow(row)
In your sample output is a blank after the second column: Is this intended? Also, your sample input has in line two a double , after the first column?
If you were looking for a R based solution you may consider extracting the coordinates from {sf} based geometry column into regular columns, and saving accordingly.
Consider this example, built on three semi-random North Carolina cities:
library(sf)
library(dplyr)
cities <- data.frame(name = c("Raleigh", "Greensboro", "Wilmington"),
x = c(-78.633333, -79.819444, -77.912222),
y = c(35.766667, 36.08, 34.223333)) %>%
st_as_sf(coords = c("x", "y"), crs = 4326)
cities # a class sf data.frame
Simple feature collection with 3 features and 1 field
geometry type: POINT
dimension: XY
bbox: xmin: -79.81944 ymin: 34.22333 xmax: -77.91222 ymax: 36.08
geographic CRS: WGS 84
name geometry
1 Raleigh POINT (-78.63333 35.76667)
2 Greensboro POINT (-79.81944 36.08)
3 Wilmington POINT (-77.91222 34.22333)
mod_cit <- cities %>%
mutate(long = st_coordinates(.)[,1],
lat = st_coordinates(.)[,2]) %>%
st_drop_geometry()
mod_cit # a regular data.frame
name long lat
1 Raleigh -78.63333 35.76667
2 Greensboro -79.81944 36.08000
3 Wilmington -77.91222 34.22333

How to pass a file as argument to parameter in Python in a specific format?

I have a file or dataframe like below containing city name , latitude and longitude.
city.head(4)
City Latitude Longitude
Seattle 47.620422 122.349358
Dubai 25.276987 55.296249
Mexico 19.432608 99.1332
Tokyo 35.652832 139.839478
I want pass all the lattitude and longitude from city dataframe to an API and get the corresponding
results. Currently I able to pass providing feeding lat and long as input to the parms.
How can I automate the entire steps. Lat and Long should be passed into params in this format lat:long****( first 3 decimals points should be extracted from the city dataframe)
import requests
headers = {
'Authorization': 'Api-Key ',
}
params = (
('coords', '49.910:10.920, 47.620:122:349'),
)
response = requests.get('https://api.example.com/we/v12/forecasts', headers=headers, params=params)
Sample Output of API
'{"results":[{"place":{"type":"locode","value":"PLWRO"},"measures":[{"ts":1572177600000,"t2m":19.6,"t_min":12.16,"t_max":20.59,"wspd":26,"dir":"W","wgust":37,"rh2m":44,"prsmsl":1015,"skcover":"clear","precip":0.0,"snowd":0,"thunderstorm":"N","fog":"L"}]},{"place":{"type":"locode","value":"DEHAM"},"measures":[{"ts":1572177600000,"t2m":10.49,"t_min":8.18,"t_max":10.6,"wspd":21,"dir":"W","wgust":39,"rh2m":69,"prsmsl":1016,"skcover":"partly_cloudy","precip":0.0,"snowd":0,"thunderstorm":"N","fog":"L"}]}]}'
How can this be done.
Round the float columns to 3rd decimal and then convert to string. Combine the columns needed. Iterate the column and send data to the API.
>>> df = df.round(3).astype(str)
>>> df
City Latitude Longitude
0 Seattle 47.62 122.349
1 Dubai 25.277 55.296
2 Mexico 19.433 99.133
3 Tokyo 35.653 139.839
>>> df['LatLong'] = df.Latitude.add(':') + df.Longitude
>>> df
City Latitude Longitude LatLong
0 Seattle 47.62 122.349 47.62:122.349
1 Dubai 25.277 55.296 25.277:55.296
2 Mexico 19.433 99.133 19.433:99.133
3 Tokyo 35.653 139.839 35.653:139.839
>>> df.LatLong.str.cat(sep=', ')
'47.62:122.349, 25.277:55.296, 19.433:99.133, 35.653:139.839'
One Liner
>>> df.Longitude.str.cat(df.Latitude, sep=':').str.cat(sep=', ')
I agree with Vishnudev's answer, but that particular way of doing so would require you to transform your original data, which may not be the best way to deal with the situation.
In order to conserve the original data, you can create a small set of calculated columns in your dataframe like this:
df['latp'] = round(df['Latitude'], 3)
df['lonp'] = round(df['Longitude'], 3)
df['param'] = df['latp'].astype(str) + ':' + df['lonp'].astype(str)
...and pass the 'param' column as a parameter.
Cheers!

How do I write a for loop within a function to pickup values within a csv?

I have a file called sampleweather100 which has Latitudes and longtidudes of addresses. If i manually type in these lats and longs under the location list function, I get the output I desire. However, I want to write a function where it pulls out the output for all rows of my csv without me manually entering it:
import pandas as pd
my_cities = pd.read_csv('sampleweather100.csv')
from wwo_hist import retrieve_hist_data
#lat = -31.967819
#lng = 115.87718
#location_list = ["-31.967819,115.87718"]
frequency=24
start_date = '11-JAN-2018'
end_date = '11-JAN-2019'
api_key = 'MyKey'
location_list = ["('sampleweather100.csv')['Lat'],('sampleweather100.csv')['Long']"]
hist_weather_data = retrieve_hist_data(api_key,
location_list,
start_date,
end_date,
frequency,
location_label = False,
export_csv = True,
store_df = True)
My function location_list = ["('sampleweather100.csv')['Lat'],('sampleweather100.csv')['Long']"] does not work. Is there a better way or a forloop that will fetch each rows lat and long into that location_list function:
Reprex of dataset:
my_cities
Out[89]:
City Lat Long
0 Lancaster 39.754545 -82.636371
1 Canton 40.851178 -81.470345
2 Edison 40.539561 -74.336307
3 East Walpole 42.160667 -71.213680
4 Dayton 39.270486 -119.577078
5 Fort Wainwright 64.825343 -147.673877
6 Crystal 45.056106 -93.350020
7 Medford 42.338916 -122.839771
8 Spring Valley 41.103816 -74.045399
9 Hillsdale 41.000879 -74.026089
10 Newyork 40.808582 -73.951553
Your way of building the list just does not make sense. You are using the filename of the csv, which is just a string and holds no reference to the file itself or the dataframe you have created from it.
Since you buildt a dataframe called my_cities from your csv using pandas, you need to extract your list of pairs from the dataframe my_cities:
location_list = [','.join([str(lat), str(lon)]) for lat, lon in zip(my_cities['Lat'], my_cities['Long'])]
This is the list you get with the above line using your sample dataframe:
['39.754545,-82.636371', '40.851178000000004,-81.470345',
'40.539561,-74.33630699999999', '42.160667,-71.21368000000001',
'39.270486,-119.577078', '64.825343,-147.673877', '45.056106,-93.35002',
'42.338916,-122.839771', '41.103815999999995,-74.045399',
'41.000879,-74.026089', '40.808582,-73.951553']
You could use one of these to covert the dataframe into a list of comma-separated pairs:
location_list = [
'{},{}'.format(lat, lon)
for i, (lat, lon) in my_cities.iterrows()
]
or
location_list = [
'{},{}'.format(lat, lon)
for lat, lon in my_cities.values
]

strip data frame cell then create columns

i'm trying to take the info from dataframe and break it out into columns with the following header names. the info is all crammed into 1 cell.
new to python, so be gentle.
thanks for the help
my code:
r=requests.get('https://nclbgc.org/search/licenseDetails?licenseNumber=80479')
page_data = soup(r.text, 'html.parser')
company_info = [' '.join(' '.join(info.get_text(", ", strip=True).split()) for info in page_data.find_all('tr'))]
df = pd.DataFrame(company_info, columns = ['ic_number, status, renewal_date, company_name, address, county, telephon, limitation, residential_qualifiers'])
print(df)
the result i get:
['License Number, 80479 Status, Valid Renewal Date, n/a Name, DLR Construction, LLC Address, 3217 Vagabond Dr Monroe, N
C 28110 County, Union Telephone, (980) 245-0867 Limitation, Limited Classifications, Residential Qualifiers, Arteaga, Vi
cky Rodriguez']
You can use read_html with some post processing:
url = 'https://nclbgc.org/search/licenseDetails?licenseNumber=80479'
#select first table form list of tables, remove only NaNs rows
df = pd.read_html(url)[0].dropna(how='all')
#forward fill NaNs in first column
df[0] = df[0].ffill()
#merge values in second column
df = df.groupby(0)[1].apply(lambda x: ' '.join(x.dropna())).to_frame().rename_axis(None).T
print (df)
Address Classifications County License Number \
1 3217 Vagabond Dr Monroe, NC 28110 Residential Union 80479
Limitation Name Qualifiers Renewal Date \
1 Limited DLR Construction, LLC Arteaga, Vicky Rodriguez
Status Telephone
1 Valid (980) 245-0867
Replace the df line like below:
df = pd.DataFrame(company_info, columns = ['ic_number', 'status', 'renewal_date', 'company_name', 'address', 'county', 'telephon', 'limitation', 'residential_qualifiers'])
Each column mentioned under columns should be within quotes. Else it is considered as one single column.

Categories