Store list elements in single variable for query - python

I am currently facing a probably very simple problem and think too complicated to solve.
I got a excel-file with city names and postal codes.
I read the file and export the postal codes (PLZ) with
zipfile = pd.read_excel("file.xlsx")
zipcode = pd.DataFrame(data, columns=['PLZ']).values
Output is: [80331][80333] ....
Each ZIP code is later used to conduct a query on a website.
For that I use bs4 and request and the follwing line of code (is not the complete code, just the relevant line):
data = {'tx_ybpn_storefinder[searchReq][term]': zip}
The process is:
Enter the ZIP code from the list (in "zip")
Query on the website
Save the results (data) of the website-query
Query with the next ZIP code
Save data from query
Repeat for every zip code in the list
I think I have to work here with a for/while-loop-combination, but actually I dont know how. Is it necessary to store each zip code in a unique variable?
Thanks in advance!

I think I have to work here with a for/while-loop-combination
Right. Loop over the values in the PLZ column:
zipcode = pd.read_excel("file.xlsx")
for zip in zipcode['PLZ']:
data = {'tx_ybpn_storefinder[searchReq][term]': zip}
# query the website, etc.

Related

transform JSON file to be usable

Long story short, i get the query from spotify api which is JSON that has data about newest albums. How do i get the specific info from that like let's say every band name or every album title. I've tried a lot of ways to get that info that i found on the internet and nothing seems to work for me and after couple of hours im kinda frustrated
JSON data is on jsfiddle
here is the request
endpoint = "https://api.spotify.com/v1/browse/new-releases"
lookup_url = f"{endpoint}"
r = requests.get(lookup_url, headers=headers)
print(r.json())
you can find the
When you make this request like the comments have mentioned you get a dictionary which you can then access the keys and values. For example if you want to get the album_type you could do the following:
print(data["albums"]["items"][0]["album_type"])
Since items contains a list you would need to get the first values 0 and then access the album_type.
Output:
single
Here is a link to the code I used with your json.
I suggest you look into how to deal with json data in python, this is a good place to start.
I copied the data from the jsfiddle link.
Now try the following code:
import ast
pyobj=ast.literal_eval(str_cop_from_src)
later you can try with keys
pyobj["albums"]["items"][0]["album_type"]
pyobj will be a python dictionary will all data.

Python - I want to increase the Row-Index automatically

I am absolutely new to Python or coding for that matter, hence, any help would be greatly appreciated. I have around 21 Salesforce orgs and am trying to get some information from each of the org into one place to send out in an email.
import pandas as pd
df = pd.read_csv("secretCSV.csv", usecols = ['client','uname','passw','stoken'])
username = df.loc[[1],'uname'].values[0]
password = df.loc[[1],'passw'].values[0]
sectocken = df.loc[[1],'stoken'].values[0]
I have saved all my username, password, security tokens in secretCSV.csv file and with the above code I can get the data for 1 row as the index value I have given is 0. I would like to know how can I loop through this and after each loop, how to increase the index value until all rows from the CSV file is read.
Thank you in advance for any assistance you all can offer.
Adil
--
You can iterate on the dataframe but it's highly not recommend (not efficient, looks bad, too much code etc)
df = pd.read_csv("secretCSV.csv", usecols = ['client','uname','passw','stoken'])
so DO NOT DO THIS EVEN IF IT WORKS:
for i in range (0, df.shape[0]):
username = df.loc[[i],'uname'].values[0]
password = df.loc[[i],'passw'].values[0]
sectocken = df.loc[[i],'stoken'].values[0]
Instead, do this:
sec_list = [(u,p,s) for _,u,p,s in df.values]
now you have a sec_list with tuples (username, password, sectocken)
access example: sec_list[0][1] - as in row=0 and get the password (located at [1]).
Pandas is great when you want to apply operations to a large set of data, but is usually not a good fit when you want to manipulate individual cells in python. Each cell would need to be converted to a python object each time its touched.
For your goals, I think the standard csv module is what you want
import csv
with open("secretCSV.csv", newline='') as f:
for username, password, sectoken in csv.reader(f):
# do all the things
Thank you everyone for your responses. I think I will first start with python learning and then get back to this. I should have learnt coding before coding. :)
Also, I was able to iterate (sorry, most of you said not to iterate the dataframe) and get the credentials from the file.
I actually have 21 salesforce orgs and am trying to get License information from each of them and email to certain people on a daily basis. I didn't want to expose salesforce credentials, hence, went with a flat file option.
I have build the code to get the salesforce license details and able to pull the same in the format I want for 1 client. However, I have to do this for 21 clients and thought of iterating the credentials so I can run the getLicense function on loop until all 21 client's data is fetched.
I will learn Python or at least learn a little bit more than what I know now and come back to this again. Until then, Informatica and batch script would have to do.
Thank you again to each one of you for your help!
Adil
--

How would I be able to remove this part of the variable?

So I am making a code like a guessing game. The data for the guessing game is in the CSV file so I decided to use pandas. I have tried to use pandas to import my csv file, pick a random row and put the data into variables so I can use it in the rest of the code but, I can't figure out how to format the data in the variable correctly.
I've tried to split the string with split() but I am quite lost.
ar = pandas.read_csv('names.csv')
ar.columns = ["Song Name","Artist","Intials"]
randomsong = ar.sample(1)
songartist = randomsong["Artist"]
songname = (randomsong["Song Name"])
songintials = randomsong["Intials"]
print(songname)
My CSV file looks like this.
Song Name,Artist,Intials
Someone you loved,Lewis Capaldi,SYL
Bad Guy,Billie Eilish,BG
Ransom,Lil Tecca,R
Wow,Post Malone, W
I expect the output to be the name of the song from the csv file. For Example
Bad Guy
Instead the output is
1 Bad Guy
Name: Song Name, dtype:object
If anyone knows the solution please let me know. Thanks
You're getting a series object as output. You can try
randomsong["Song Name"].to_string()
Use df['column].values to get values of the column.
In your case, songartist = randomsong["Artist"].values[0] because you want only the first element of the returned list.

Reading a dictionary from within a dictionary

I have a json file for tweet data. The data that I want to look at is the text of the tweet. For some reason, some of the tweets are too long to put into the normal text part of the dictionary.
It seems like there is a dictionary within another dictionary and I can't figure out how to access it very well.
Basically, what I want in the end is one column of a data frame that will have all of the text from each individual tweet. Here is a link to a small sample of the data that contains a problem tweet.
Here is the code I have so far:
import json
import pandas as pd
tweets = []
#This writes the json file so that I can work with it. This part works correctly.
with open("filelocation.txt") as source
for line in source:
if line.strip():
tweets.append(json.loads(line))
print(len(tweets)
df = pd.DataFrame.from_dict(tweets)
df.info()
When looking at the info you can see that there will be a column called extended_tweet that only encompasses one of the two sample tweets. Within this column, there seems to be another dictionary with one of those keys being full_text.
I want to add another column to the dataframe that just has this information along with the normal text column when the full_text is null.
My first thought was to try and read that specific column of the dataframe as a dictionary again using:
d = pd.DataFrame.from_dict(tweets['extended_tweet]['full_text])
But this doesn't work. I don't really understand why that doesn't work as that is how I read the data the first time.
My guess is that I can't look at the specific names because I am going back to the list and it would have to read all or none. The error it gives me says "KeyError: 'full_text' "
I also tried using the recommendation provided by this website. But this gave me a None value no matter what.
Thanks in advance!
I tried to do what #Dan D. suggested, however, this still gave me errors. But it gave me the idea to try this:
tweet[0]['extended_tweet']['full_text']
This works and gives me the value that I am looking for. But I need to run through the whole thing. So I tried this:
df['full'] = [tweet[i]['extended_tweet']['full_text'] for i in range(len(tweet))
This gives me "Key Error: 'extended_tweet' "
Does it seem like I am on the right track?
I would suggest to flatten out the dictionaries like this:
tweet = json.loads(line)
tweet['full_text'] = tweet['extended_tweet']['full_text']
tweets.append(tweet)
I don't know if the answer suggested earlier works. I never got that successfully. But I did figure out something else that works well for me.
What I really needed was a way to display the full text of a tweet. I first loaded the tweets from the json with what I posted above. Then I noticed that in the data file, there is something called truncated. If this value is true, the tweet is cut short and the full tweet is placed within the
tweet[i]['extended_tweet]['full_text]
In order to access it, I used this:
tweet_list = []
for i in range(len(tweets)):
if tweets[i]['truncated'] == 'True':
tweet_list.append(tweets[i]['extended_tweet']['full_text']
else:
tweet_list.append(tweets[i]['text']
Then I can work with the data using the whol text from each tweet.

Search a single column for a particular value in a CSV file and return an entire row

Issue
The code does not correctly identify the input (item). It simply dumps to my failure message even if such a value exists in the CSV file. Can anyone help me determine what I am doing wrong?
Background
I am working on a small program that asks for user input (function not given here), searches a specific column in a CSV file (Item) and returns the entire row. The CSV data format is shown below. I have shortened the data from the actual amount (49 field names, 18000+ rows).
Code
import csv
from collections import namedtuple
from contextlib import closing
def search():
item = 1000001
raw_data = 'active_sanitized.csv'
failure = 'No matching item could be found with that item code. Please try again.'
check = False
with closing(open(raw_data, newline='')) as open_data:
read_data = csv.DictReader(open_data, delimiter=';')
item_data = namedtuple('item_data', read_data.fieldnames)
while check == False:
for row in map(item_data._make, read_data):
if row.Item == item:
return row
else:
return failure
CSV structure
active_sanitized.csv
Item;Name;Cost;Qty;Price;Description
1000001;Name here:1;1001;1;11;Item description here:1
1000002;Name here:2;1002;2;22;Item description here:2
1000003;Name here:3;1003;3;33;Item description here:3
1000004;Name here:4;1004;4;44;Item description here:4
1000005;Name here:5;1005;5;55;Item description here:5
1000006;Name here:6;1006;6;66;Item description here:6
1000007;Name here:7;1007;7;77;Item description here:7
1000008;Name here:8;1008;8;88;Item description here:8
1000009;Name here:9;1009;9;99;Item description here:9
Notes
My experience with Python is relatively little, but I thought this would be a good problem to start with in order to learn more.
I determined the methods to open (and wrap in a close function) the CSV file, read the data via DictReader (to get the field names), and then create a named tuple to be able to quickly select the desired columns for the output (Item, Cost, Price, Name). Column order is important, hence the use of DictReader and namedtuple.
While there is the possibility of hard-coding each of the field names, I felt that if the program can read them on file open, it would be much more helpful when working on similar files that have the same column names but different column organization.
Research
CSV Header and named tuple:
What is the pythonic way to read CSV file data as rows of namedtuples?
Converting CSV data to tuple: How to split a CSV row so row[0] is the name and any remaining items are a tuple?
There were additional links of research, but I cannot post more than two.
You have three problems with this:
You return on the first failure, so it will never get past the first line.
You are reading strings from the file, and comparing to an int.
_make iterates over the dictionary keys, not the values, producing the wrong result (item_data(Item='Name', Name='Price', Cost='Qty', Qty='Item', Price='Cost', Description='Description')).
for row in (item_data(**data) for data in read_data):
if row.Item == str(item):
return row
return failure
This fixes the issues at hand - we check against a string, and we only return if none of the items matched (although you might want to begin converting the strings to ints in the data rather than this hackish fix for the string/int issue).
I have also changed the way you are looping - using a generator expression makes for a more natural syntax, using the normal construction syntax for named attributes from a dict. This is cleaner and more readable than using _make and map(). It also fixes problem 3.

Categories