I've got multiple excels and I need a specific value but in each excel, the cell with the value changes position slightly. However, this value is always preceded by a generic description of it which remains constant in all excels.
I was wondering if there was a way to ask Python to grab the value to the right of the element containing the string "xxx".
try iterating over the excel files (I guess you loaded each as a separate pandas object?)
somehting like for df in [dataframe1, dataframe2...dataframeN].
Then you could pick the column you need (if the column stays constant), e.g. - df['columnX'] and find which index it has:
df.index[df['columnX']=="xxx"]. Maybe will make sense to add .tolist() at the end, so that if "xxx" is a value that repeats more than once, you get all occurances in alist.
The last step would be too take the index+1 to get the value you want.
Hope it was helpful.
In general I would highly suggest to be more specific in your questions and provide code / examples.
This is my code:
dv = DataValidation(type="list", formula1='"11111,22222,33333,44444,55555,66666,77777,88888,99999,111110,122221,133332,144443,155554,166665,177776,188887,199998,211109,222220,233331,244442,255553,266664,277775,288886,299997,311108,322219,333330,344441,355552,366663,377774,388885,399996,411107,422218,433329,444440,455551,466662,477773,488884,499995,511106,522217,533328,544439,555550,566661,577772,588883,599994,611105,622216,633327,644438,655549,666660,677771,688882,699993,711104,722215,733326,744437,755548,766659,777770,788881,799992,811103,822214,833325,844436,855547,866658,877769,888880,899991,911102,922213,933324,944435,955546,966657,977768,988879,999990,1011101,1022212,1033323,1044434,1055545,1066656,1077767,1088878,1099989,1111100,1122211"', allow_blank=False)
sheet.add_data_validation(dv)
dv.add('K5')
But then I have this issue:
BUT if formula1 list is small ... then all is working fine.....
WHat is the way to add a BIG list of options which will not cause issues(as you can see above)?
Excel may impose additional limits on what it accepts. See https://learn.microsoft.com/en-us/openspecs/office_standards/ms-oi29500/8ebf82e4-4fa4-43a6-9ecd-d2d793a6f4bf. In the implementers notes there is additional information but I cannot find the passage referred to.
Basically, I think it's generally easier to refer to values on a separate sheet.
I have loaded some data from CSV files into two dataframes, X and Y that I intend to perform some analysis on. After cleaning them up a bit I can see that the indexes of my dataframes appear to match (they're just sequential numbers), except one has index with type object and the other has index with type int64. Please see attached image for a clearer idea of what I'm talking about.
I have tried manually altering this using X.index.astype('int64') and also X.reindex(Y.index) but neither seem to do anything here. Could anyone suggest anything?
Edit: Adding some additional info in case it is helpful. X was imported as row data from the csv file and transposed whereas Y was imported directly with the index set from the first column of the csv file.
So I've realised what I've done and it was pretty dumb. I should have written
X.index = X.index.astype('int64')
instead of just
X.index.astype('int64')
Oh well, the more you know.
I'm currently working on a project that takes a csv list of student names who attended a meeting, and converts it into a list (later to be compared to full student roster list, but one thing at a time). I've been looking for answers for hours but I still feel stuck. I've tried using both pandas and the csv module. I'd like to stick with pandas, but if it's easier in the csv module that works too. CSV file example and code below.
The file is autogenerated by our video call software- so the formatting is a little weird.
Attendance.csv
see sample as image, I can't insert images yet
Code:
data = pandas.read_csv("2A Attendance Report.csv", header=3)
AttendanceList = data['A'].to_list()
print(str(AttendanceList))
However, this is raising KeyError: 'A'
Any help is really appreciated, thank you!!!
As seen in sample image, you have column headers in the first row itself. Hence you need to remove header=3 from your read_csv call. Either replace it with header=0 or don't specify any explicit header value at all.
I am trying to add a column from one dataframe to another,
df.head()
street_map2[["PRE_DIR","ST_NAME","ST_TYPE","STREET_ID"]].head()
The PRE_DIR is just the prefix of the street name. What I want to do is add the column STREET_ID at the associated street to df. I have tried a few approaches but my inexperience with pandas and the comparison of strings is getting in the way,
street_map2['STREET'] = df["STREET"]
street_map2['STREET'] = np.where(street_map2['STREET'] == street_map2["ST_NAME"])
The above code shows an "ValueError: Length of values does not match length of index". I've also tried using street_map2['STREET'].str in street_map2["ST_NAME"].str. Can anyone think of a good way to do this? (note it doesn't need to be 100% accurate just get most and it can be completely different from the approach tried above)
EDIT Thank you to all who have tried so far I have not resolved the issues yet. Here is some more data,
street_map2["ST_NAME"]
I have tried this approach as suggested but still have some indexing problems,
def get_street_id(street_name):
return street_map2[street_map2['ST_NAME'].isin(df["STREET"])].iloc[0].ST_NAME
df["STREET_ID"] = df["STREET"].map(get_street_id)
df["STREET_ID"]
This throws this error,
If it helps the data frames are not the same length. Any more ideas or a way to fix the above would be greatly appreciated.
For you to do this, you need to merge these dataframes. One way to do it is:
df.merge(street_map2, left_on='STREET', right_on='ST_NAME')
What this will do is: it will look for equal values in ST_NAME and STREET columns and fill the rows with values from the other columns from both dataframes.
Check this link for more information: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html
Also, the strings on the columns you try to merge on have to match perfectly (case included).
You can do something like this, with a map function:
df["STREET_ID"] = df["STREET"].map(get_street_id)
Where get_street_id is defined as a function that, given a value from df["STREET"]. will return a value to insert into the new column:
(disclaimer; currently untested)
def get_street_id(street_name):
return street_map2[street_map2["ST_NAME"] == street_name].iloc[0].ST_NAME
We get a dataframe of street_map2 filtered by where the st-name column is the same as the street-name:
street_map2[street_map2["ST_NAME"] == street_name]
Then we take the first element of that with iloc[0], and return the ST_NAME value.
We can then add that error-tolerance that you've addressed in your question by updating the indexing operation:
...
street_map2[street_map2["ST_NAME"].str.contains(street_name)]
...
or perhaps,
...
street_map2[street_map2["ST_NAME"].str.startswith(street_name)]
...
Or, more flexibly:
...
street_map2[
street_map2["ST_NAME"].str.lower().replace("street", "st").startswith(street_name.lower().replace("street", "st"))
]
...
...which will lowercase both values, convert, for example, "street" to "st" (so the mapping is more likely to overlap) and then check for equality.
If this is still not working for you, you may unfortunately need to come up with a more accurate mapping dataset between your street names! It is very possible that the street names are just too different to easily match with string comparisons.
(If you're able to provide some examples of street names and where they should overlap, we may be able to help you better develop a "fuzzy" match!)
Alright, I managed to figure it out but the solution probably won't be too helpful if you aren't in the exact same situation with the same data. Bernardo Alencar's answer was essential correct except I was unable to apply an operation on the strings while doing the merge (I still am not sure if there is a way to do it). I found another dataset that had the street names formatted similar to the first. I then merged the first with the third new data frame. After this I had the first and second both with columns ["STREET_ID"]. Then I finally managed to merge the second one with the combined one by using,
temp = combined["STREET_ID"]
CrimesToMapDF = street_maps.merge(temp, left_on='STREET_ID', right_on='STREET_ID')
Thus getting the desired final data frame with associated street ID's