Quantcast
Channel: Active questions tagged excel - Stack Overflow
Viewing all articles
Browse latest Browse all 88868

Multiple Categorical Variables in a Column & The Prep

$
0
0

I have a survey data, with text answers, categorical variables, and numeric.

Converted into a dataframe in pandas, but the problem is multiple-choice columns, sometimes have more than 1 categorical variables, because the survey was designed as "choose all applies".

For example:

ID  Category    Num1 Num2 Num3
1   A, B, C     1    1    1
2   B, C, D     1    0    1
3   A, C        1    1    1
4   A           0    1    1
5   A, C, D     0    1    1

I am trying to correlate these categories to the numerical variables.

Let's say the presence of A to value of Num1.

But when I use the dataframe, as it is, Python (and R) considers, for example [A, B, C] as another category, recognizes the whole cell as the category.

I think I need a method of parsing (exploding?) the value (in a hidden way), before feeding into a statistical analysis command.

How can I solve this problem?


Viewing all articles
Browse latest Browse all 88868


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>