pPolis2024-10-29
nishio UMAP visualization from data from the 2022 Upper House election by the University of Tokyo's Taniguchi Lab, Communist Island (Communist Island) and so on. https://gyazo.com/da86893dd4e4a0d4e8321f820251d9c9
nishio Communal island and nearby mixed islands https://gyazo.com/7483712c10cac770769da024e2bcfb3e
nishio Self-supporting island, public island and a large mixed island nearby. https://gyazo.com/93b7e7dd8c963b8f79e480aeee3d60ee
https://gyazo.com/ded0f771d275222169bd06dddcf01d5f
https://gyazo.com/843b394913c884514e61f3cffc752789
https://gyazo.com/a5ee7ba2f0041965ad410920eb56fbbchttps://gyazo.com/eded459106b9331a9b8e018bd4ab3475
https://gyazo.com/1e04477f03347e2d01a6fc6d86264011https://gyazo.com/f06909d884fe0dad9b8c15503af76fb1https://gyazo.com/07ee5f54c0bb37176cda5662cf6710ff
https://gyazo.com/8676b0ea7e03071e89bc0169b408eb64
https://gyazo.com/c89cf28104d5923186df3a996c737c60
code:py
import umap
# Dimensionality reduction by UMAP
reducer = umap.UMAP(
n_neighbors=15, min_dist=0.1, n_components=2, random_state=42, n_jobs=1
)
embedding = reducer.fit_transform(matrix)
# Create a DataFrame containing the UMAP results and include the matrix index (User_ID)
umap_df"User_ID" = matrix.index # index of matrix as User_ID # Add User_ID column to data, then merge with umap_df
data = data.reset_index().rename(
columns={"index": "User_ID"}
) # Add index as User_ID
merged_data = pd.merge(data, umap_df, on="User_ID", how="inner") # Merge by User_ID
# Add column of party names
data = merged_data
# Prepare color map
unique_parties = data"PARTY".unique() colors = plt.cm.tab10(np.linspace(0, 1, len(unique_parties)))
parties_letter = {
"Liberal Democratic Party": "自", "自", "自", "自", "自", "自", "自
"Constitutional Democratic Party of Japan": "standing", "standing".
"Komeito": "ko", "ko", "ko", "ko", "ko", "ko", "ko
"Japan Restoration Association": "維", "restoration".
"National Democratic Party": "country", "country".
"Communist Party": "共", "共", "共", "共", "共", "共", "共
"Social Democratic Party": "Sha", "Sha".
"Reiwa Shinsengumi": "Re", "Re".
"NHK Party": "N",.
"suffragettes": "suffragettes", "suffragettes".
"various": "various", "various", "various", "various
"Independent Candidate": "No", "No".
}
# Visualization
plt.figure(figsize=(10, 7))
for i, party in enumerate(unique_parties):
party_data = data[data"PARTY" == party] pname = political_partiesparty pletter = parties_letter[political_partiesparty] plt.scatter(
s=0,
marker="+",
label=f"{pname}({pletter})",
)
for p in party_data.iterrows():
plt.text(
s=pletter,
fontsize=5,
ha="center",
va="center",
)
plt.title("Political Parties UMAP Visualization")
plt.xlabel("UMAP1")
plt.ylabel("UMAP2")
plt.legend(title="political party")
plt.show()
Next, it would be interesting to apply Polis-like cluster feature extraction to the clusters to create a public opinion map-like AI Cluster Description. ---
Regarding the Polis-like visualization of the 2022 council elections, I used to do it with PCA, and the other day I had AI explain that principal component axis to me, and it wasn't very interesting.
Based on that, I thought, "Well, it's okay if I can't find meaning in the axis, so I can visualize it in a different way," and here is what I tried using UMAP, which is used in Talk to the City.
It's kind of split like that.
It's interesting that there are "radical Communists" on a remote islet who are gathered only by the Communists and "mild Communists" who live together with other parties.
So are the people.
Who are the LDP people who are mixed in with the remote isles of the Restoration (if you look it up, of course)?
I was thinking of having the AI explain it by clustering the islands after UMAP, but if they are separated like this, it might make more sense to have a human cluster the islands that are clearly separated by specifying the range and clustering only the large islands into a few islands.
---
To add context, this data, which was only divided into two parts by PCA of 600 people, was divided into six parts by UMAP, so this was a preliminary experiment to see if UMAP could be used to create a detailed map that divides the public opinion map into two parts.
---
Select a range from Scatter and display where it is selected
code:py
# Function to display data in a selection
def on_select(eclick, erelease):
x1, y1 = eclick.xdata, eclick.ydata # Start point of selection
x2, y2 = erelease.xdata, erelease.ydata # End point of selection
# Show selections and data
print(
)
from matplotlib.widgets import RectangleSelector
# Monitor selection events using RectangleSelector
fig, ax = plt.subplots()
selector = RectangleSelector(ax, on_select, useblit=True, button=1, interactive=True) # Visualization
# plt.figure(figsize=(10, 7))
for i, party in enumerate(unique_parties):
party_data = data[data"PARTY" == party] pname = political_partiesparty pletter = parties_letter[political_partiesparty] ax.scatter(
s=0,
marker="+",
label=f"{pname}({pletter})",
)
for p in party_data.iterrows():
ax.text(
s=pletter,
fontsize=5,
ha="center",
va="center",
)
plt.title("Political Parties UMAP Visualization")
plt.xlabel("UMAP1")
plt.ylabel("UMAP2")
plt.legend(title="political party")
plt.show()
https://gyazo.com/5e7cd8225308b38dc02f6255d9742445
https://gyazo.com/ee781335c743d0654b1784f1805ce56b
I'm working on Polisian Analysis (Polis without the real-time scatter plots) in one code right now, but now that I understand the parts, I'll break it down. Given a set of clusters, a component that produces a characteristic opinion that separates the in's and out's of the clusters.
The figure depicts the Fisher exact test, but that is not the exact name.
The intention of using the components is completely different.
We are not testing whether they are different clusters, but rather we are diverting the scale used to test the cluster to the scale used to select the questions that characterize the cluster.
Maybe we don't have the right words yet to describe the concept because we haven't treated this part as a component.
I want to cut out this part of the component so that it can be used independently.
In the same way, if we UMAP the poll map data and give it a party icon, we might be able to identify opinion clusters that are not represented by a political party...
---
2024-10-31
nishio The first one, the second one is the characteristic opinion vector that this range of legislators have. https://gyazo.com/840f96feb977be80294a6dc75e5b9c31https://gyazo.com/1ac5ee3e873990fc3b63f176b17fa3d7
nishio I'm talking about the AI not only clustering on its own, but also allowing humans to specify "this range" for analysis. nishio That means we've made it this far. https://gyazo.com/d230fddf28841c8e2f5d3c323dd64ca9
I'm curious to see what the difference of opinion is between the pure communist island further to the upper left and the rest of this cluster.
Mathematically, it should be possible...
Technically...
I was frustrated (or at least it wasn't something I could choosy about).
I gave up and did individual clusters, which surprisingly accomplished my goal.
I left the results here.
---
This page is auto-translated from /nishio/pPolis2024-10-29 using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. I'm very happy to spread my thought to non-Japanese readers.