Multiple Categorical Variables in a Column & The Prep

February 16, 2020, 9:11 am

≫ Next: Create a table in excel sheet and add rows in table with SQL in loop

≪ Previous: Overflow error when assigning input value to range vba

I have a survey data, with text answers, categorical variables, and numeric.

Converted into a dataframe in pandas, but the problem is multiple-choice columns, sometimes have more than 1 categorical variables, because the survey was designed as "choose all applies".

For example:

ID  Category    Num1 Num2 Num3
1   A, B, C     1    1    1
2   B, C, D     1    0    1
3   A, C        1    1    1
4   A           0    1    1
5   A, C, D     0    1    1

I am trying to correlate these categories to the numerical variables.

Let's say the presence of A to value of Num1.

But when I use the dataframe, as it is, Python (and R) considers, for example [A, B, C] as another category, recognizes the whole cell as the category.

I think I need a method of parsing (exploding?) the value (in a hidden way), before feeding into a statistical analysis command.

How can I solve this problem?

↧