How can I organize my data [closed] - python

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
Let's say I have a list composed of a client number, number of a store, first name, last name and address which in this format:
11, 2, Lisa, Anderson, NewYork
13, 4, John, Smith, Alabama
54, 2, Lucy, Nicholsson, NewYork
etc.
What is the best way for me to organize this data within python so I can easily access it so I can do stuff like input a client number and have an output with the location, and other derivatives of stuff like that.

You can use pandas. It provides database-like (or spreadsheet-like) tables which can be used to store and query data. Like this:
import pandas as pd
df = pd.DataFrame([
[11, 2, 'Lisa', 'Anderson', 'NewYork'],
[13, 4, 'John', 'Smith', 'Alabama'],
[54, 2, 'Lucy', 'Nicholsson', 'NewYork']
], columns = ['cl_number', 'store', 'first_name', 'last_name','address'])
df.index=df["cl_number"]
# rows will be indexed with cl_number
df.loc[11]
# returns a record of client with number 11
df.loc[11, 'address']
# returns address of a client with number 11
df[df['address'] == 'NewYork']
# returns a list of all clients with address 'NewYork'
However, you may also need full-featured database (see SQLite, for example).

If your data is reasonably consistent and there isn't so much that you want a fully-fledged database, you can get quite far with a namedtuple:
from collections import namedtuple
Client = namedtuple('Client', ('id', 'storeno', 'first_name', 'last_name',
'location'))
# Read the data
with open('db.csv') as fi:
rows = fi.readlines()
db = {}
for row in rows:
f= row.split(',')
db[int(f[0])] = Client(int(f[0]), int(f[1]), f[2].strip(),
f[3].strip(), f[4].strip())
def find(**kwargs):
"""Return a list of all Clients matching a given set of criteria."""
matches = []
for client in db.values():
for k,v in kwargs.items():
if getattr(client, k) != v:
break
else:
matches.append(client)
return matches
# Client with ID 11
print db[11]
# All clients in New York
print find(location='NewYork')
# All clients called Lisa in New York
print find(location='NewYork', first_name='Lisa')

Related

How to update a dynamic list in python? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
I had a list that has Business = ['Company name','Mycompany',Revenue','1000','Income','2000','employee','3000','Facilities','4000','Stock','5000'] , the output of the list structure is shown below:
Company Mycompany
Revenue 1000
Income 2000
employee 3000
Facilities 4000
Stock 5000
The Dynamic list gets updated ***
for every iteration for the list and some of the items in the list is
missing
***. for example execution 1 the list gets updated as below:
Company Mycompany
Income 2000 #revenue is missing
employee 3000
Facilities 4000
Stock 5000
In the above list the Revenue is removed from list as company has no revenue, in second example :
Company Mycompany
Revenue 1000
Income 2000
Facilities 4000 #Employee is missing
Stock 5000
In the above example 2 Employee is missing. How to create a output list that replaces the missing values with 0, in example 1 revenue is missing , hence I have to replace the output list with ['Revenue,'0'] at its position, for better understanding please find below
output list created for example 1: Revenue replaced with 0
Company Mycompany| **Revenue 0**| Income 2000| employee 3000| Facilities 4000| Stock 5000
Output list for example 2: employee is replaced with 0
Company Mycompany| Revenue 1000| Income 2000| **employee 0**| Facilities 4000| Stock 5000
How can I achieve the output list with replacing the output list with 0 on missing list items without changing the structure of list. My code so far:
for line in Business:
if 'Company' not in line:
Business.insert( 0, 'company')
Business.insert( 1, '0')
if 'Revenue' not in line:
#got stuck here
if 'Income' not in line:
#got stuck here
if 'Employee' not in line:
#got stuck here
if 'Facilities' not in line:
#got stuck here
if 'Stock' not in line:
#got stuck here
Thanks a lot in advance
If you are getting inputs as a list then you can convert the list into a dict like this then you'll have a better approach on data, getting as a dictionary would be a better choice though
Business = ['Company name','Mycompany','Revenue',1000,'Income',2000,'employee',3000,'Facilities',4000,'Stock',5000]
BusinessDict = {Business[i]:Business[i+1] for i in range(0,len(Business)-1,2)}
print(BusinessDict)
As said in the comments, a dict is a much better data structure for the problem. If you really need the list, you could use a temporary dict like this:
example = ['Company name','Mycompany','Income','2000','employee','3000','Facilities','4000','Stock','5000']
template = ['Company name', 'Revenue', 'Income', 'employee', 'Facilities', 'Stock']
# build a temporary dict
exDict = dict(zip(example[::2], example[1::2]))
# work on it
result = []
for i in template:
result.append(i)
if i in exDict.keys():
result.append(exDict[i])
else:
result.append(0)
A bit more efficient (but harder to understand for beginners) would be to create the temporary dict like this:
i = iter(example)
example_dict = dict(zip(i, i))
This works because zip uses lazy evaluation.
You can use dictionary like this:
d={'Company':0,'Revenue':0,'Income':0,'employee':0,'Facilities':0,'Stock':0}
given=[['Company','Mycompany'],['Income',2000],['employee',3000],['Facilities',4000],['Stock',5000]]
for i in given:
d[i[0]]=i[1]
ans=[]
for key,value in d.items():
ans.append([key,value])

Populate 2 columns of dataframe at the same time using apply function [duplicate]

This question already has answers here:
Apply Python function to one pandas column and apply the output to multiple columns
(4 answers)
Closed 1 year ago.
I have some code which is (simplified) like this. The actual data lists are tens of thousands in size, not just 3.
There is a dictionary of staff which I make a DataFrame from.
There is a list of dictionary objects which contain additional staff information.
Also:
The staff list and the extra staff information (master_info_list) overlap but each has items that are unique to them.
The "index" I am using (StaffNumber) is actually prefixed with "SN_" in the extra staff information, so I can't compare them directly.
The duplication of StaffNumber in the master_info_list is intended (that's just how I receive it!).
What I want to do is populate two new columns into the dataframe which get their data from the extra staff information. I can do this by making 2 separate calls to get_department_and_manager, one for Department and one for Manager. That works. But, it "feels" like I should be able to take 2 fields from the output of get_department_and_manager and populate the dataframe in one go, but I'm struggling to get the syntax right. What is the correct syntax (if possible)? Also, iterating through the list the way I do (with a for loop) seems inefficient. Is there a better way?
The examples I have seen all seem to create new columns from existing data in the dataframe, or they are simple examples where no mashing of data is required before comparing the two "lists" (or list and dictionary).
import pandas as pd
def get_department_and_manager(row, master_list):
dept = 'bbb'
manager = 'aaa'
for i in master_list:
if i['StaffNumber'] == 'SN_' + row['StaffNumber']:
dept = i['data']['Department']
manager = i['data']['Manager']
break
return [dept, manager]
staff = {'Name': ['Alice', 'Bob', 'Dave'],
'StaffNumber': ['001', '002', '004']}
master_info_list = [{'StaffNumber': 'SN_001', 'data': {'StaffNumber': 'SN_001', 'Department': 'Sales', 'Manager': 'Luke' }},
{'StaffNumber': 'SN_002', 'data': {'StaffNumber': 'SN_002', 'Department': 'Marketing', 'Manager': 'Mary' }},
{'StaffNumber': 'SN_003', 'data': {'StaffNumber': 'SN_003', 'Department': 'IT', 'Manager': 'Neal' }}]
df = pd.DataFrame(data=staff)
df[['Department']['Manager']] = df.apply(get_department_and_manager, axis='columns', args=[master_info_list])
print(df)
If I understand you correctly, you can use .merge:
x = pd.DataFrame([v["data"] for v in master_info_list])
x["StaffNumber"] = x["StaffNumber"].str.split("_").str[-1]
print(df.merge(x, on="StaffNumber", how="left"))
Prints:
Name StaffNumber Department Manager
0 Alice 001 Sales Luke
1 Bob 002 Marketing Mary
2 Dave 004 NaN NaN

What the easiest way to convert an API output to a data frame [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
AcledData = pd.read_csv("https://api.acleddata.com/acled/read?terms=accept&country=Afghanistan&date=20200315.csv", sep=',',quotechar='"', encoding ='utf-8')
print(AcledData)
Empty DataFrame
Columns: [{"status":200, success:true, last_update:117, count:500, data:[{"data_id":"6996791", iso:"4", event_id_cnty:"AFG44631", event_id_no_cnty:"44631", event_date:"2020-03-21", year:"2020", time_precision:"1", event_type:"Battles", sub_event_type:"Armed clash", actor1:"Taliban", assoc_actor_1:"", inter1:"2", actor2:"Military Forces of Afghanistan (2014-)", assoc_actor_2:"", inter2:"1", interaction:"12", region:"Caucasus and Central Asia", country:"Afghanistan", admin1:"Balkh", admin2:"Dawlat Abad", admin3:"", location:"Dawlat Abad", latitude:"36.9882", longitude:"66.8207", geo_precision:"2", source:"Xinhua; Khaama Press", source_scale:"National-International", notes:"On 21 March 2020, 12 Taliban militants including 2 commanders were killed and 5 including a commander were wounded when Afghan forces repulsed their attack in Dawlat Abad district, Balkh.", fatalities:"12", timestamp:"1584984341", iso3:"AFG"}, {"data_id":"6997066", iso:"4".1, event_id_cnty:"AFG44667", event_id_no_cnty:"44667", event_date:"2020-03-21".1, year:"2020".1, time_precision:"1".1, event_type:"Violence against civilians", sub_event_type:"Attack", actor1:"Unidentified Armed Group (Afghanistan)", assoc_actor_1:"".1, inter1:"3", actor2:"Civilians (Afghanistan)", assoc_actor_2:"Muslim Group (Afghanistan); Teachers (Afghanistan)", inter2:"7", interaction:"37", region:"Caucasus and Central Asia".1, country:"Afghanistan".1, admin1:"Kabul", admin2:"Kabul", admin3:"".1, location:"Kabul", latitude:"34.5167", longitude:"69.1833", geo_precision:"1", source:"Pajhwok Afghan News", source_scale:"National", notes:"On 21 March 2020.1, 1 religious scholar and teacher was killed by an unknown gunmen in Kabul city.", fatalities:"1", timestamp:"1584984341".1, iso3:"AFG"}.1, {"data_id":"6997171", iso:"4".2, event_id_cnty:"AFG44715", event_id_no_cnty:"44715", event_date:"2020-03-21".2, year:"2020".2, time_precision:"2", event_type:"Battles".1, sub_event_type:"Armed clash".1, actor1:"Taliban".1, assoc_actor_1:"".2, inter1:"2".1, actor2:"Military Forces of Afghanistan (2014-)".1, assoc_actor_2:"".1, inter2:"1".1, interaction:"12".1, region:"Caucasus and Central Asia".2, country:"Afghanistan".2, admin1:"Balkh".1, admin2:"Nahri Shahi", admin3:"".2, location:"Nahri Shahi", latitude:"36.8544", longitude:"67.1800", geo_precision:"2".1, source:"Voice of Jihad", source_scale:"Other", notes:"As reported on 21 March 2020, 3 Afghan security personnel were killed and 5 were wounded following an attack by Taliban militants on a check point in Nahri Shahi district, Balkh. Fatalities coded as 0 (VoJ reported 3 fatalities).", fatalities:"0", ...]
Index: []
The query returns a Json string, but the relevant data is nested into that json. You will have to:
read the returned string as a json object
use the relevant part of that object to feed a dataframe
For example using urllib.request, you could do:
data = json.load(urllib.request.urlopen('https://api.acleddata.com/acled/read?terms=accept&country=Afghanistan&date=20200315.csv'))['data']
df = pd.DataFrame(data)
If you want to convert that to a csv file, no need for pandas, but you should use the csv module:
data = json.load(urllib.request.urlopen('https://api.acleddata.com/acled/read?terms=accept&country=Afghanistan&date=20200315.csv'))['data']
with open('file.csv', 'w', newline=''):
wr = csv.DictWriter(fd, fieldnames=data[0].keys())
_ = wr.writeheader()
for d in data:
_ = wr.writerow(d)
This is not csv, this is JSON
import pandas as pd
api = "https://api.acleddata.com/acled/read?terms=accept&country=Afghanistan&date=20200315.csv"
AcledData = pd.read_json(api)
the data field is then again JSON but you can use a similar technique/dataframe methods to get what you want

Querying JSON with Python [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 years ago.
Improve this question
I have parsed the JSON with json.load.
Now I want to query that JSON dict using SQL-like commands. Does anything exist like this in Python? I tried using Pynq https://github.com/heynemann/pynq but that didn't work too well and I've also looked into Pandas but not sure if that's what I need.
Here is a simple pandas example with Python 2.7 to get you started...
import json
import pandas as pd
jsonData = '[ {"name": "Frank", "age": 39}, {"name": "Mike", "age":
18}, {"name": "Wendy", "age": 45} ]'
# using json.loads because I'm working with a string for example
d = json.loads(jsonData)
# convert to pandas dataframe
dframe = pd.DataFrame(d)
# Some example queries
# calculate mean age
mean_age = dframe['age'].mean()
# output - mean_age
# 34.0
# select under 40 participants
young = dframe.loc[dframe['age']<40]
# output - young
# age name
#0 39 Frank
#1 18 Mike
# select Wendy from data
wendy = dframe.loc[dframe['name']=='Wendy']
# output - wendy
# age name
# 2 45 Wendy

Storing data in Redis then fetch that data [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I have an Excel file that contains four columns. I want to fetch this data and store it in MySQL. Later on I want fetch the data from here and store in Redis, then run a validation on it. I have already done the importing of data from Excel to Python.
You have to reshape your 4 column excel data to a 1 column data.
The redis client for Matlab/GNU Octave is doing this e.g.: https://github.com/markuman/go-redis/wiki/Data-Structure#arrays
Take care that in this example, Matlab/Octave are using Column-Major-Order.
Python is using Row-Major-Order: https://en.wikipedia.org/wiki/Row-major_order
So you have to save your 4 column X rows data as a row-major-order list in redis as a list (RPUSH).
example
given this excel sheet
using this python3 code
#!/usr/bin/python3
# -*- coding: utf-8 -*-
"""
Created on Tue Oct 20 23:02:53 2015
#author: markus
"""
import pandas as pd
import redis
# redis connection
r = redis.StrictRedis(host='localhost', port=6379, db=0)
# open the first worksheed
df = pd.read_excel('/home/markus/excel.xlsx',0)
# read in as a list
# [[1, 'two', 'python'], ['excel', 'redis', 'action']]
a = list(df.T.itertuples())
print("this is a, your excel list")
print(a)
for list in a:
for value in list:
r.rpush('myexceldata', str(value))
# read all back to python
b = r.lrange('myexceldata', '0', '-1')
print("A1 becomes 0, B1 becomes 3 ...")
print(b[3].decode('UTF-8'))
to save it serialized as a list in redis
127.0.0.1:6379> lrange myexceldata 0 -1
1) "1"
2) "two"
3) "python"
4) "excel"
5) "redis"
6) "action"
This is just one way to save a spreadsheet in redis. It always belong on your datastructure and what you're going to do with it.

Categories