I have a survey data, with text answers, categorical variables, and numeric.
Converted into a dataframe in pandas, but the problem is multiple-choice columns, sometimes have more than 1 categorical variables, because the survey was designed as "choose all applies".
For example:
ID Category Num1 Num2 Num3
1 A, B, C 1 1 1
2 B, C, D 1 0 1
3 A, C 1 1 1
4 A 0 1 1
5 A, C, D 0 1 1
I am trying to correlate these categories to the numerical variables.
Let's say the presence of A to value of Num1.
But when I use the dataframe, as it is, Python (and R) considers, for example [A, B, C] as another category, recognizes the whole cell as the category.
I think I need a method of parsing (exploding?) the value (in a hidden way), before feeding into a statistical analysis command.
How can I solve this problem?