Polis2024-05-13
Experiment to analyze data not derived from Polis
This page is an experimental log
Find the previous source code
I guess I didn't publish it.
Because it's not so much a repository.
Changing upstream data
This data is not Polis-derived, but 0/1 data, with verticals for users and horizontals for questions; unlike Polis, there is no missing data.
The cluster was easily formed.
https://gyazo.com/e9644ad9051f355672a0c1ada76fd9fc
Fix error where it gives details not just images.
code:diff
- for k in range(n_voters):
+ for k in range(len(clusters_K_star)):
This is a bug I didn't notice last time the data was too small, the error occurs when n_voters exceeds 100.
n_voters == 112, len(clusters_K_star) == 100
KeyError: 0
gpt.icon
Confirmation of data frame structure:
Check the columns and indexes of the matrix to understand in what format the data is stored.
Execute print(matrix.columns) and print(matrix.index) to check its structure.
Looking at the output results, the matrix columns are... String
Cause of KeyError: The original code matrix[c].iloc[idx_g] treats c as an integer, when in fact c must be a string.
Read the code
statements_all_in = range(8)
This, by the way, was a messy hard code last time.
Change to statements_all_in = matrix.columns.
The phenomenon that is occurring is
code:py
for c in range(N_comments):
So:.
code:py
comment = statements_all_inc # comment id is of the correct form
The output results are, well, convincing.
N=112
python t.py 2.28s user 6.43s system 558% cpu 1.560 total
https://gyazo.com/e9644ad9051f355672a0c1ada76fd9fc
N=1116
python t.py 2.96s user 5.94s system 613% cpu 1.449 total
https://gyazo.com/0430f910c32c23c127a8fe81f573a199https://gyazo.com/b332640c731edb5d8a4a7817ddbce546
N=2231
python t.py 8.20s user 9.98s system 803% cpu 2.261 total
N=4462
python t.py 7.22s user 9.72s system 730% cpu 2.318 total
N=11153
python t.py 13.42s user 20.98s system 617% cpu 5.574 total
https://gyazo.com/16271572fa1f387819883f0ecf1f47b8
N=111526
python t.py 19.73s user 22.84s system 724% cpu 5.879 total
https://gyazo.com/197096fa68831b70bf8d691153356179
You can analyze 110,000 cases of data in about 30 seconds.
2024-05-14
Last time, we used 18 different questions, "Does this matter?" only used 0/1, but this time we will convert the multiple choice for each question to One Hot
I don't usually use Panads, but I can easily write "things that can be written with for statements, but are a pain if they occur frequently".
gpt.icon
code:py
df = pd.DataFrame(data)
# Filter columns containing 'selectedIndex'
filtered_df = df.filter(like='selectedIndex')
# Create dummy variable (0/1 column)
df_dummies = pd.get_dummies(df, columns=selectedIndex_cols, prefix=selectedIndex_cols)
Easy One Hot Encoding of answers to multiple-choice questions
111526 rows x 74 columns
This is still about 40 seconds.
Whether there are 1,000 or 110,000 cases, the structure is almost the same
1000 items
https://scrapbox.io/files/664344c9d6b62a001cd4b58b.png
110,000
https://scrapbox.io/files/664344cb5c41730022b02d04.png
After all, large data tends to be in two clusters.
N=110,000, K=3.
It's not funny, so I limited the clusters to three, and this is what I got.
https://scrapbox.io/files/664344d01c9a04001db06894.png
Commentary by GPT
Based on the voting patterns and interests of each group, we suggest the following group names
Group 0: Cautious Evaluators
Rationale: This group consistently provides neutral or mild feedback to the "neither" or "0" option on questions about economic policy and security law. It represents a cautious and unbiased viewpoint.
Group 1: Pragmatic Supporters
Rationale: This group tends to support government initiatives such as tax increases, security legislation, and relocation of military bases. They demonstrate a realistic acceptance of government actions. Their voting patterns are significantly skewed toward positive or supportive responses.
Group 2: Constitutional Conservatives
Rationale: This group's concerns relate to adherence to the Constitution and a cautious stance toward military intervention and nuclear energy. It maintains constitutional norms against changes in state policy and expresses a skeptical perspective on the military role and the expansion of nuclear energy. It reflects a cautious and conservative approach.
These names are intended to reflect the voting patterns and primary concerns of each group and focus on responses to specific policy questions without assuming ideological motivations.
nishio.iconimpressions
Commentary adequately reflects the data.
"You just put 'neither' in the middle of a dichotomy!" I thought.
Forced 5-way split
N=110,000, K=5.
https://gyazo.com/f1e4df47808c3db06875529eae6bd8e5
gpt.iconBelow are the appropriate group names based on the different groups' responses to the Japanese political questions:
Group 0: Progressive Reformers
This group strongly opposes security legislation, including the right to collective self-defense, takes a position of not appreciating Abenomics, and supports an immediate phase-out of nuclear power. The group also opposes the relocation of the U.S. military base in Okinawa to Henoko, Nago City, and opposes the revision of Article 9 of the Constitution. These positions demonstrate a progressive and reformist approach that values fidelity to the Constitution and is cautious about military and nuclear policies.
Group 1: Conservative Guardians
Members of this group generally support security legislation that strengthens Japan's defense capabilities, appreciate Abenomics, support the relocation of U.S. military bases in Okinawa, and support amending the Constitution to clearly state the Self Defense Forces. They also support the promotion of tourism policies, including casinos, and are conservative, defense-oriented, and focused on economic growth.
Group 2: Cautious Moderates
This group is somewhat more moderate in its support for Abenomics and security legislation than Group 1, supporting initiatives such as casinos and base relocation, but cautious about fiscal policies such as tax increases. They have a balanced view that weighs potential benefits and social impacts.
Group 3: Indifferent Skeptics
They appear to be skeptical or difficult to make decisions on major issues, with noticeably neutral reactions on nuclear electric policy, and are not positive about major changes or strong opinions. This may reflect an attitude of indifference or low engagement.
Group 4: Critical observers
This group is highly critical of current policies such as Abenomics and security legislation and resists the relocation of U.S. military bases. Their critical stance on key economic and defense policies and their deep observation and skepticism of government actions are evident.
These labels are intended to capture the general trends and sentiments of each group based on their voting patterns and responses to key policy questions in Japan.
---
This page is auto-translated from /nishio/Polis2024-05-13 using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. I'm very happy to spread my thought to non-Japanese readers.